Entering edit mode
5.4 years ago
mschmid
▴
180
Hello
Do some of you remove reads originating from rRNA in silico before testing for differential expression?
Why or why not? Is it worth it? Or is it depending on the analysis pipeline in your opinion?
My data is based on poly-A enrichment of mRNA and about 2-5% of the reads (Illumina SE 100bp) are from rRNA operons.
I plan to use HISAT2>stringtie>ballgown workflow as a first strategy to test for DE. I might later try different methods, depending on the ballgown results.
As one typically quantifies reads against a transcriptome or GTF file and neither should include rRNA one does not explicitely remove them. Still, as they are not represented in the references, they are not counted anyway. I realize that this workflow you mention is prominent because it was published high by reputable people, I still do not see why one should use it. Stringtie assembles transcriptomes, so unless you really need that I would avoid it. Also ballgown seems to be bulky to me. My preferred pipeline, which is well-maintained is quantification of reads against a transcriptome by
salmon
, aggregation of transcript abundance estimations to the gene level withtximport
and differential analysis withedgeR
, whereas the latter can also be done withDESeq2
. The mentioned tools have awesome tutorials and developers are responsive to issues at BioC. You might give them a try.RNAseq are only side projects to me so I consider myself rather an amateur. However, I use pipelines similar to ATpoint's using
STAR
as alignment tool andfeatureCounts
for quantification.My point is to showcase additional options, not comparing this to ATpoint's suggestion. However, I do assist ATpoint in his opinion on the HISAT2 pipeline - I had to implement it for a customer and it feels more clunky than necessary...
FYI, 2-5% of sequence coming from rRNA is really high for poly-A enriched data. You should instead expect <<1%.
Hmm... just saw that it is "only" about 1.5-2%. But still not <<1%
I think 2% is what you're supposed to get if rRNA depletion works well, rather than due to poly-A enrichment.