Question

Is removing rRNA a necessary step in RNA-seq?

1

Entering edit mode

4.9 years ago

2822462298 ▴ 120

I used SortMeRNA to remove rRNA sequences in my raw RNA-seq data. I got ~95% clean data for 7 out of 8 samples. For the remaining one, I only got ~75%..., around 20% was mapped to the eukaryotic 18s and 28s sequence. Later in the differential expression analysis, the wired sample appeared to be an outgroup in the PCA plot and it cannot be clustered with other replicated samples.

Therefore, I may have to discard this sample in my DE analysis. But I may also skip the rRNA removal step so that it will not cause the problem...What should I do?

RNA-Seq rRNA sortmerna alignment • 6.8k views

ADD COMMENT • link updated 2.6 years ago by Dreamer ▴ 40 • written 4.9 years ago by 2822462298 ▴ 120

score 2 · Accepted Answer · 2020-02-03

2

Entering edit mode

4.9 years ago

Devon Ryan 105k

There's no reason to bother removing rRNA if (like most people) you're not quantifying it later. Usually one just looks at the percentage of reads in feature (e.g., with multiQC on the featureCounts output) and excludes outlier samples. That won't tell you that a sample was an outlier due to rRNA contamination, but that's rarely actionable information in and of itself (you'd still want to see it as an outlier in PCA).

ADD COMMENT • link 4.9 years ago by Devon Ryan 105k

2

Entering edit mode

Also since you generally have some residual rRNA "contamination" even after poly-A selection ... you could be throwing off normalization factors that take into account your library size.

ADD REPLY • link 4.9 years ago by benformatics 4.1k

0

Entering edit mode

So this means even after counting one should not remove rRNA genes? e.g HISAT2-->FeatureCounts-->DESeq2

ADD REPLY • link 4.4 years ago by ATCG ▴ 400

0

Entering edit mode

Yep exactly, you can filter them out at the end from the DE genes if they are not interesting to you.

ADD REPLY • link 4.4 years ago by benformatics 4.1k

2

Entering edit mode

No, if you don't care about them then you should remove the counts from the matrix. Otherwise you're needless inflating the tests you're doing and deflating your power. The normalization should be robust to their presence, but if there's a LOT of rRNA contamination in one sample then that tends to cause issues with the normalization factors.

ADD REPLY • link 4.4 years ago by Devon Ryan 105k

1

Entering edit mode

Yeah I was thinking they should keep them in for the size factors calculation but then it would be ok to remove them. But checking through the DESeq2 manual it didn't seem very obvious as to how to do that. In edgeR, it is a little more straightforward...

I think dropping them off the bat would only be OK if you checked that they were similar across samples.

ADD REPLY • link 4.4 years ago by benformatics 4.1k

0

Entering edit mode

Thank you, Devon. Yes, this is one of the reasons would prefer to remove them only I have not been able to find the gene IDs. Any idea where I can get the list of Drosophila Melanogaster rDNA gene ENSEMBL IDs?

ADD REPLY • link 4.4 years ago by ATCG ▴ 400

1

Entering edit mode

So you did not count them in first place? Here is Ensembl Drosophila rDNA scaffold. Same scaffold at flybase.

ADD REPLY • link 4.4 years ago by GenoMax 148k

0

Entering edit mode

Thanks, genomax. You filter them from the raw counts before DESeq2. The scaffold works for alignment but at this point, I already have the counts matrix how can I obtain just the gene IDs?

ADD REPLY • link 4.4 years ago by ATCG ▴ 400

0

Entering edit mode

rRNA reads could be mapped to coding genes which share partial sequence similarity to rRNAs if your reference does not contain rRNA gene. So remove them before alignment might be a better choice.

ADD REPLY • link 2.6 years ago by Dreamer ▴ 40