Question

Need consultation about my rna-seq workflow

0

Entering edit mode

7 months ago

markusz ▴ 10

Hi. I'm 3 weeks into trying to work my way through analyzing gene expressions from fq files. At this point after trying at least 20 times with high p and q values (compared to results company gave me) I don't know what to do. I have an 8 step workflow.

FastQC to check how to clean raw reads.
Trimmomatic to clean them
FastQC again to check if it worked. If not, go back to point 2.
Indexing gene with STAR based on .fa and.gtf files from Ensembl
Mapping with STAR
Counting and preparing for ballgown with StringTie using .gtf file
Using prepDE.py3 to create gene count table
Analysis or results with ballgown and DESeq2 in R.

So far I've tried a lot of different options and values for them in every program used, and nothing really worked out well. Maybe I'm just dumb, that's one possibility. But I hope someone can help me understanding what I'm doing wrong. I have illumina novaseq 6000 paired ends reads.

RNA-seq Gene-expression • 1.1k views

ADD COMMENT • link updated 7 months ago by Ram 44k • written 7 months ago by markusz ▴ 10

2

Entering edit mode

You should find a local expert to consult/work with. There are too many points where things can go wrong for us to guess at your issue(s) based on the little info provided.

ADD REPLY • link 7 months ago by jared.andrews07 ★ 18k

1

Entering edit mode

Just double checking here:

with low p and q values (compared to results company gave me)

Low p and q values would be good (meaning significantly DE genes), I guess you mean few significant results?

ADD REPLY • link 7 months ago by Michael 55k

0

Entering edit mode

Yes, that's what I meant. Thanks for correcting me. I've edited post to not confuse anyone.

ADD REPLY • link 7 months ago by markusz ▴ 10

score 6 · Answer 1 · 2024-03-19

As you were told already, this is too time-demanding to be explained or diagnosed in step-by-step fashion via internet. However, there are tutorials and papers that contain detailed descriptions:

score 6 · Answer 2 · 2024-03-20

Hi Markusz,

In addition to what Mensur recommended, I would encourage you to try and simplify your workflow. Overcomplicating the workflow makes things just more frustrating than they need to be. Especially iterative QC and trimming are rarely required these days, in particular with the modern equipment like the Novaseq 6000 or X. I propose the following workflow:

Index the genome with STAR (exactly once per genome)
Single round of trimming and QC: fastp (this step is almost also optional)
Mapping and counting with STAR in one go (--quantMode GeneCounts) (If you need more control over the counting process add optional step 3b.) (3b.) FeatureCounts to make count matrix
DE analysis with DESeq2 in R
Optionally: summarize your QC output (fastp, Star mapping reports) with multiQC into one compact report

score 0 · Answer 3 · 2024-03-20

Hi Markusz,

I also agree that with the current information provided is difficult to identify the main bottlenecks or even the outcome of current tests, what do you mean it didn't work out well? You didn't find any DEGs? I also assume you confirmed that the experimental part worked well.

Have you tested the simple pipeline suggested by Michael?

If you would like to quickly check another pipeline, we recently published our protocol where you just need Docker, our RumBall docker image, and follow the scripts with easy explanations to adapt to your FASTQ files. We do use STAR and DESeq2. Please have a look at https://star-protocols.cell.com/protocols/3354

Best of luck,

Eijy