My goal was to quantify how many ribosomal reads (5.8S) there were in my library, and use these to normalize gene expression. It makes sense in my experiment. I took 2 approaches:
1 - read count
Affter mapping with Tophat
, no multi-mappers allowed, I used featureCounts
to count reads mapping to features, and tallied up the rRNA reads as those mapping to "LSU-rRNA_Hsa"
.
Using this method, there are between 10^5 to 10^6 rRNA reads for each sample.
2 - mapping to rRNA
Following these instructions, I mapped reads directly to the rRNA sequences provided in the iGenomes bundle (AbundantSequences/hum5SrDNA) using bowtie
.
Using this method, there are between 10^3 to 10^4 mapped rRNA reads for each sample. This is quite a difference!
Questions:
Why such a difference between methods? I actually expected to get more reads when mapping directly to sequence since the multi-mapping issue is avoided.
Is any of these methods adequate for the purpose?
Which alternative method would you suggest?
I might actually not use this data, for a number of reason, but I am curios about what happened here.
I would say that the simple and straightforward answer is that the two reference sequences that you are aligning and counting against are not similar to one another.