We have tested a new rRNA depletion protocol and we would like to control for its efficiency.
We have three RNA-Seq samples extracted from Drosophila, we would like to check for the depletion.
I was wondering what is the most efficient way to do that. As I map the samples using
STAR
, is it a good idea to quantify the bam files using the gtf (this file contains rRNA genes, such as 5S, 5.8S, 18S etc). If the deletion was good, this genes should show a very low quantified valueWould it make more sense to run
STAR
with the original fastq files only against the rRNA-indexed genome?Reading some of the past posts here gave me the idea of testing
bbduk.sh
from the BBMap suite. something like that here:
bbduk.sh in1=read1.fq in2=read2.fq out1=rRNA1.fq out2=rRNA2.fq outm=NON_ribo.fq outm2=NON_ribo.fa ref=my_rRNA_sequences.fasta
would something like that work?
I have a follow-up question regardinf this project.
We would like to exclude possible contamination of foreign genomes. Would
bbduk
fit here better (especially faster?) thenSTAR
?would again, the same command apply as the one I used above for checking against rRNA, just using a different fata files?
e.g.
thanks
In principal, yes that should work. Are you interested in collecting reads that don't match the expected genome? Since you are dealing with short reads, there may be a chance that some reads may get selected if they happen to match short stretches in both genomes.
yes, short reads. We have paired-end of 75 bases long. Would it maybe makes sense to increase the kmer length to e.g.
k=40
?You want to keep k shorter then half the read length to ensure good initial matches, so perhaps something around 30 or 35 should be fine.