Question

Would you remove rRNA reads in silico before testing for diff. expression?

0

Entering edit mode

5.7 years ago

mschmid ▴ 180

Hello

Do some of you remove reads originating from rRNA in silico before testing for differential expression?

Why or why not? Is it worth it? Or is it depending on the analysis pipeline in your opinion?

My data is based on poly-A enrichment of mRNA and about 2-5% of the reads (Illumina SE 100bp) are from rRNA operons.

I plan to use HISAT2>stringtie>ballgown workflow as a first strategy to test for DE. I might later try different methods, depending on the ballgown results.

RNA-Seq • 1.6k views

ADD COMMENT • link 5.7 years ago by mschmid ▴ 180

2

Entering edit mode

As one typically quantifies reads against a transcriptome or GTF file and neither should include rRNA one does not explicitely remove them. Still, as they are not represented in the references, they are not counted anyway. I realize that this workflow you mention is prominent because it was published high by reputable people, I still do not see why one should use it. Stringtie assembles transcriptomes, so unless you really need that I would avoid it. Also ballgown seems to be bulky to me. My preferred pipeline, which is well-maintained is quantification of reads against a transcriptome by salmon, aggregation of transcript abundance estimations to the gene level with tximport and differential analysis with edgeR, whereas the latter can also be done with DESeq2. The mentioned tools have awesome tutorials and developers are responsive to issues at BioC. You might give them a try.

ADD REPLY • link 5.7 years ago by ATpoint 88k

1

Entering edit mode

RNAseq are only side projects to me so I consider myself rather an amateur. However, I use pipelines similar to ATpoint's using STAR as alignment tool and featureCounts for quantification.

My point is to showcase additional options, not comparing this to ATpoint's suggestion. However, I do assist ATpoint in his opinion on the HISAT2 pipeline - I had to implement it for a customer and it feels more clunky than necessary...

ADD REPLY • link 5.7 years ago by Carambakaracho ★ 3.3k

0

Entering edit mode

FYI, 2-5% of sequence coming from rRNA is really high for poly-A enriched data. You should instead expect <<1%.

ADD REPLY • link 5.7 years ago by Devon Ryan 105k

0

Entering edit mode

Hmm... just saw that it is "only" about 1.5-2%. But still not <<1%

ADD REPLY • link 5.7 years ago by mschmid ▴ 180

0

Entering edit mode

I think 2% is what you're supposed to get if rRNA depletion works well, rather than due to poly-A enrichment.

ADD REPLY • link 5.7 years ago by Devon Ryan 105k