Hi everyone,
I am trying to see if my RNAseq fastq file has any rRNA contamination. The reason I think we might have contaminations is because my GC content plot from fastqc has many peaks, and it basically fails. I used sortmerna with their default euk rRNA database, and I found about 18 gig out of 25 gig reads to be rRNA. I am not sure if this is right, since my data is from mouse, but the database is for all euk. I would very much appreciate if someone can point me to where I can find mouse rRNA database? Would it be enough to use gencode annotation file to filter out rRNA annotations, and extract the corresponding fasta file of those annotation and use that as the database?
Thanks!
Using blast you can search your data against to mouse rRNA sequences. for this you can use
remote
option of blast withspecies
option. You can specify mouse in the species option of blast.Thank you, I will try that
You can normally align the reads against genome + annotation (gencode), using STAR for example which can count the number of reads/feature. Then just check the annotated rRNA gene count percentage. You would need "good" samples as a control of what is normal/bad.
rRNA can cause to low mapping ratio of reads to genome.