Need consultation about my rna-seq workflow
3
0
Entering edit mode
8 months ago
markusz ▴ 10

Hi. I'm 3 weeks into trying to work my way through analyzing gene expressions from fq files. At this point after trying at least 20 times with high p and q values (compared to results company gave me) I don't know what to do. I have an 8 step workflow.

  1. FastQC to check how to clean raw reads.
  2. Trimmomatic to clean them
  3. FastQC again to check if it worked. If not, go back to point 2.
  4. Indexing gene with STAR based on .fa and.gtf files from Ensembl
  5. Mapping with STAR
  6. Counting and preparing for ballgown with StringTie using .gtf file
  7. Using prepDE.py3 to create gene count table
  8. Analysis or results with ballgown and DESeq2 in R.

So far I've tried a lot of different options and values for them in every program used, and nothing really worked out well. Maybe I'm just dumb, that's one possibility. But I hope someone can help me understanding what I'm doing wrong. I have illumina novaseq 6000 paired ends reads.

RNA-seq Gene-expression • 1.1k views
ADD COMMENT
2
Entering edit mode

You should find a local expert to consult/work with. There are too many points where things can go wrong for us to guess at your issue(s) based on the little info provided.

ADD REPLY
1
Entering edit mode

Just double checking here:

with low p and q values (compared to results company gave me)

Low p and q values would be good (meaning significantly DE genes), I guess you mean few significant results?

ADD REPLY
0
Entering edit mode

Yes, that's what I meant. Thanks for correcting me. I've edited post to not confuse anyone.

ADD REPLY
6
Entering edit mode
ADD COMMENT
6
Entering edit mode
8 months ago
Michael 55k

Hi Markusz,

In addition to what Mensur recommended, I would encourage you to try and simplify your workflow. Overcomplicating the workflow makes things just more frustrating than they need to be. Especially iterative QC and trimming are rarely required these days, in particular with the modern equipment like the Novaseq 6000 or X. I propose the following workflow:

  1. Index the genome with STAR (exactly once per genome)
  2. Single round of trimming and QC: fastp (this step is almost also optional)
  3. Mapping and counting with STAR in one go (--quantMode GeneCounts) (If you need more control over the counting process add optional step 3b.) (3b.) FeatureCounts to make count matrix
  4. DE analysis with DESeq2 in R
  5. Optionally: summarize your QC output (fastp, Star mapping reports) with multiQC into one compact report
ADD COMMENT
0
Entering edit mode
8 months ago
Eijy Nagai ▴ 90

Hi Markusz,

I also agree that with the current information provided is difficult to identify the main bottlenecks or even the outcome of current tests, what do you mean it didn't work out well? You didn't find any DEGs? I also assume you confirmed that the experimental part worked well.

Have you tested the simple pipeline suggested by Michael?

If you would like to quickly check another pipeline, we recently published our protocol where you just need Docker, our RumBall docker image, and follow the scripts with easy explanations to adapt to your FASTQ files. We do use STAR and DESeq2. Please have a look at https://star-protocols.cell.com/protocols/3354

Best of luck,

Eijy

ADD COMMENT

Login before adding your answer.

Traffic: 2229 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6