Question

How well was the rRNA depletion for RNA-Seq experiments

1

Entering edit mode

22 days ago

gogeni5529 ▴ 60

We have tested a new rRNA depletion protocol and we would like to control for its efficiency.

We have three RNA-Seq samples extracted from Drosophila, we would like to check for the depletion.

I was wondering what is the most efficient way to do that. As I map the samples using STAR, is it a good idea to quantify the bam files using the gtf (this file contains rRNA genes, such as 5S, 5.8S, 18S etc). If the deletion was good, this genes should show a very low quantified value
Would it make more sense to run STAR with the original fastq files only against the rRNA-indexed genome?
Reading some of the past posts here gave me the idea of testing bbduk.sh from the BBMap suite. something like that here:

bbduk.sh in1=read1.fq in2=read2.fq out1=rRNA1.fq out2=rRNA2.fq outm=NON_ribo.fq outm2=NON_ribo.fa ref=my_rRNA_sequences.fasta

would something like that work?

contamination star rrna bbduk ribodetector • 640 views

ADD COMMENT • link updated 10 days ago by GenoMax 149k • written 22 days ago by gogeni5529 ▴ 60

0

Entering edit mode

I have a follow-up question regardinf this project.

We would like to exclude possible contamination of foreign genomes. Would bbduk fit here better (especially faster?) then STAR?

would again, the same command apply as the one I used above for checking against rRNA, just using a different fata files?

e.g.

bbduk.sh in1=read1.fq in2=read2.fq outm1=correct.fq outm2=correct.fq outu1=contamination1.fq outu2=contamination1.fa ref=organism_to_check.fasta

thanks

ADD REPLY • link 10 days ago by gogeni5529 ▴ 60

0

Entering edit mode

In principal, yes that should work. Are you interested in collecting reads that don't match the expected genome? Since you are dealing with short reads, there may be a chance that some reads may get selected if they happen to match short stretches in both genomes.

ADD REPLY • link 10 days ago by GenoMax 149k

0

Entering edit mode

yes, short reads. We have paired-end of 75 bases long. Would it maybe makes sense to increase the kmer length to e.g. k=40?

ADD REPLY • link 10 days ago by gogeni5529 ▴ 60

1

Entering edit mode

You want to keep k shorter then half the read length to ensure good initial matches, so perhaps something around 30 or 35 should be fine.

ADD REPLY • link 10 days ago by GenoMax 149k

2

Entering edit mode

21 days ago

dsull ★ 7.2k

I’ve personally used bowtie2 against an index only containing rRNAs, but I imagine any other solution (bbduk, star, etc.) will also work.

In answer to your first point, you could also add those rRNAs to your existing star index, however, you might end up finding things that map equally well to both your rRNA and a part of your genome, and I’m not sure what is your quantification strategy for dealing with multimappers.

ADD COMMENT • link 21 days ago by dsull ★ 7.2k

0

Entering edit mode

You're right regarding multimappers. Especially since rRNA-related reads could be masked by being rejected in the alignment due to too many hits (usally, you have multiple rRNA-cluster distributed over your genome)

ADD REPLY • link 21 days ago by michael.ante ★ 4.0k

0

Entering edit mode

we are at the moment not really interested in gene quantification, but mainly in seeing how good was the rRNA depletion in the library preparation. For that reason it would be best to avoid having it in the complete data set. Running STAR (or bowtie2) for that matter against only the rRNA was also an alternative, but I was wondering if there was something more precise or specific than that.

I'll give both bbduk and the star+rRNA-index a go.

thanks to all.

ADD REPLY • link 21 days ago by gogeni5529 ▴ 60

score 2 · Accepted Answer · 2025-01-30

we would like to control for its efficiency.

In this case you want to see how much rRNA is remaining in the samples? If so the bbduk.sh method will extract the rRNA reads, if present.

You need to use outu1= and outu2= in your command line (you want reads that do not match to rRNA to go to these files) instead of outm=. It may be more explicit to change out1= and out2= to outm1= and outm2, to indicate that these reads match the reference used.

You could also use a program like sortmerna: https://github.com/sortmerna/sortmerna