Hi, I'm currently trying to find out if my the level of rRNA contamination in my RNASeq samples. I saw this was mentioned before: aligning the samples to rRNA sequences from Genbank using bowtie2 would show me an estimate, but I just wanted to be sure:
- If I can follow this technique?
- If yes, is there a way to extract all rRNA subunit sequences (I'm not even sure if I need them all for this reason) from the reference genome with the annotation file? (I only see the 16S sequence for the organism on GenBank :/)
- Would I need to remove these sequences before conducting a differential gene expression analysis?
- If yes, can I then (following the method above) disregard the rRNA sequences by using only the unaligned reads from bowtie2 output?
I am a beginner, any help/direction is much appreciated :) (I've tried to look up other posts but I'm not sure it helped)
I agree with Meisam that BBDuk is a good option for quantifying and filtering rRNA reads, but for a standard differential expression analysis it is not a necessary step (though quantifying rRNA contamination may be important for data QC). Instead, align your reads to your genome, get the gene counts and remove rRNA genes from the gene counts before continuing.
Thanks jv I’ve had missed the second part of the question, totally agree in this case here it’s not a necessary step.
Thanks for your help, I wanted to know for both scenarios (get an estimate & DGE) so this is perfect. Since I'm working with bacteria I'll probably have to extract the rRNA sequences from the reference genome first to use bbduk (since its able to quantify), but I understand that I don't need to remove them before generating my counts data.