Hi there,
I want to quantify read counts of small RNAs into reads per million (RPM). Which is the correct/best way to do ?
small RNAs map to multiple locations in the genome. Here is the plot to show length of sequence reads, which are of variable length.
https://www.dropbox.com/s/10dn098492zst9i/Length.png?dl=0
I mapped with bowtie2. Let's look at mapping statistics of one of the mapped libraries
2601833 reads; of these: 2601833 (100.00%) were unpaired; of these: **352919 (13.56%) aligned 0 times 362282 (13.92%) aligned exactly 1 time 1886632 (72.51%) aligned >1 times** 86.44% overall alignment rate
Total_number_mapped_reads= 362282 + 1886632 = 2248914
The problem is that "1886632" reads are mapped multiple times, hence Total_number_mapped_reads(location) will be higher than 2248914. If we consider multi mapped reads, **Total_number_mapped_reads is 5432552.
So, what number should i use for Total_number_mapped_reads to compute reads per million
thanks !!
Thanks Igor !!
I updated in the above post to clarify further.
This is not a typical miRNA-seq data which is enriched for 18-30 nt RNAs. Mapping is fine with bowtie2, get around 85-90% of mapping. The only concern i am having is regarding normalization when converting reads_counts into reads_per_million, because Total_number_mapped_reads will vary depending upon the number_of_mapped_reads or number_of_mapped_locations.
I don't know if you can say the mapping is fine, since the overwhelming majority of reads are multi-mapped. Although these are not miRNAs, it's the same challenge (the fragments are too short to be confidently mapped).
Hi Chirag,
I wonder which at which conclusion did you arrive after these years. I'm wondering to use featureCounts taking into account multi-mapping reads (featureCounts -M parameter) after aligning the reads with Bowtie2. But I'm not use how to do the RPM normalization for the counts.