Question

Is it necessary to do FastQC before using RSEM?

0

Entering edit mode

5.9 years ago

John ▴ 270

Hi

When I see High profile journals, some of them ran FastQC, TrimGalore , Trinity for pre-processing fastq files of RNA seq reads, some of them didn't.

Is it really necessary to do preprocessing (or) we can use raw fastq files in RSEM?

Thank you in advance

RNA-Seq R alignment • 2.2k views

ADD COMMENT • link updated 5.9 years ago by GenoMax 153k • written 5.9 years ago by John ▴ 270

1

Entering edit mode

I ran into a similar problem a few weeks ago. I tried an experiment where I picked a few FASTQ pairs with moderately high adapter content and ran them through my pipeline that involves RSEM+STAR both before and after adapter trimming, then evaluated the results using DESeq2. The results were not statistically significant at all. However, my pipeline does have a k-mer - based host/graft read separation algorithm before RSEM, so the results may not be 100% indicative of just RSEM/STAR's compensation techniques.

ADD REPLY • link 5.9 years ago by Ram 45k

1

Entering edit mode

It really depends on the dataset. I have experimented with read trimming before STAR. Most of the time, the impact is very minor, but I have also seen instances where gene counts are substantially different.

ADD REPLY • link 5.9 years ago by igor 13k

score 3 · Answer 1 · 2019-09-14

FastQC is a quality assessment program. It is not doing any changes to the data. So in a strict sense it is not necessary to run it. That said, FastQC provides a birds-eye view of your data and can alert you to possible issues (e.g. presense of adapter dimers, duplication in your data etc). Take results of FastQC in context of your experiment though. Failing a category on FastQC does not flag your data as automatically bad.

You will find this series of blog posts from FastQC author's of interest as you check on FastQC results.

Same with trimgalore or similar trimming program. Most aligners will handle presence of some adapter contamination and will drop those from alignments. If you are going to do any de novo analysis then it is imperative that you clean your data of extraneous sequence.

Trinity is only needed if you wish to de novo assemble your data set. If you have a genome/transcriptome available then you don't need to go that route at all.