rRNA contaminants RNASeq
1
0
Entering edit mode
19 months ago
plain_text • 0

Hi, I'm currently trying to find out if my the level of rRNA contamination in my RNASeq samples. I saw this was mentioned before: aligning the samples to rRNA sequences from Genbank using bowtie2 would show me an estimate, but I just wanted to be sure:

  1. If I can follow this technique?
  2. If yes, is there a way to extract all rRNA subunit sequences (I'm not even sure if I need them all for this reason) from the reference genome with the annotation file? (I only see the 16S sequence for the organism on GenBank :/)
  3. Would I need to remove these sequences before conducting a differential gene expression analysis?
  4. If yes, can I then (following the method above) disregard the rRNA sequences by using only the unaligned reads from bowtie2 output?

I am a beginner, any help/direction is much appreciated :) (I've tried to look up other posts but I'm not sure it helped)

rRNA Removal RNASeq Transcriptomics • 1.4k views
ADD COMMENT
3
Entering edit mode
19 months ago
Meisam ▴ 250

Welcome to Biostars,

You can remove the rRNA reads before aligning your entire reads to reference genome. There are various filtering tools you can use, my personal preference is to use "bbduk" from BBTools. Once you install the tool, download rDNA FASTA file from NCBI, and then feed this as ref argument to bbduk to remove all reads containing the rDNA:

bbduk.sh in={Your FASTQ file} ref=rDNA_complete_NCBI.fa out={Name for your trimmed output}

There are many other arguments you can feed in as well, read the bbduk manual for complete details. Also take a look at the discussions under this previous biostars post for alternative approaches.

ADD COMMENT
2
Entering edit mode

I agree with Meisam that BBDuk is a good option for quantifying and filtering rRNA reads, but for a standard differential expression analysis it is not a necessary step (though quantifying rRNA contamination may be important for data QC). Instead, align your reads to your genome, get the gene counts and remove rRNA genes from the gene counts before continuing.

ADD REPLY
0
Entering edit mode

Thanks jv I’ve had missed the second part of the question, totally agree in this case here it’s not a necessary step.

ADD REPLY
0
Entering edit mode

Thanks for your help, I wanted to know for both scenarios (get an estimate & DGE) so this is perfect. Since I'm working with bacteria I'll probably have to extract the rRNA sequences from the reference genome first to use bbduk (since its able to quantify), but I understand that I don't need to remove them before generating my counts data.

ADD REPLY

Login before adding your answer.

Traffic: 1973 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6