Question

rRNA contaminants RNASeq

0

Entering edit mode

18 months ago

plain_text • 0

Hi, I'm currently trying to find out if my the level of rRNA contamination in my RNASeq samples. I saw this was mentioned before: aligning the samples to rRNA sequences from Genbank using bowtie2 would show me an estimate, but I just wanted to be sure:

If I can follow this technique?
If yes, is there a way to extract all rRNA subunit sequences (I'm not even sure if I need them all for this reason) from the reference genome with the annotation file? (I only see the 16S sequence for the organism on GenBank :/)
Would I need to remove these sequences before conducting a differential gene expression analysis?
If yes, can I then (following the method above) disregard the rRNA sequences by using only the unaligned reads from bowtie2 output?

I am a beginner, any help/direction is much appreciated :) (I've tried to look up other posts but I'm not sure it helped)

rRNA Removal RNASeq Transcriptomics • 1.4k views

ADD COMMENT • link 18 months ago by plain_text • 0

score 3 · Accepted Answer · 2023-05-03

3

Entering edit mode

18 months ago

Meisam ▴ 250

Welcome to Biostars,

You can remove the rRNA reads before aligning your entire reads to reference genome. There are various filtering tools you can use, my personal preference is to use "bbduk" from BBTools. Once you install the tool, download rDNA FASTA file from NCBI, and then feed this as ref argument to bbduk to remove all reads containing the rDNA:

bbduk.sh in={Your FASTQ file} ref=rDNA_complete_NCBI.fa out={Name for your trimmed output}

There are many other arguments you can feed in as well, read the bbduk manual for complete details. Also take a look at the discussions under this previous biostars post for alternative approaches.

ADD COMMENT • link 18 months ago by Meisam ▴ 250

2

Entering edit mode

I agree with Meisam that BBDuk is a good option for quantifying and filtering rRNA reads, but for a standard differential expression analysis it is not a necessary step (though quantifying rRNA contamination may be important for data QC). Instead, align your reads to your genome, get the gene counts and remove rRNA genes from the gene counts before continuing.

ADD REPLY • link 18 months ago by jv ★ 1.8k

0

Entering edit mode

Thanks jv I’ve had missed the second part of the question, totally agree in this case here it’s not a necessary step.

ADD REPLY • link 18 months ago by Meisam ▴ 250

0

Entering edit mode

Thanks for your help, I wanted to know for both scenarios (get an estimate & DGE) so this is perfect. Since I'm working with bacteria I'll probably have to extract the rRNA sequences from the reference genome first to use bbduk (since its able to quantify), but I understand that I don't need to remove them before generating my counts data.

ADD REPLY • link 18 months ago by plain_text • 0