Hi, while calculating RPKM, how to get the number of reads mapped to genome. The total read counts is 11851490
I have tried samtools flagstat file.bam, the result is
12955438 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
12255658 + 0 mapped (94.60%:-nan%)
12955438 + 0 paired in sequencing
6477719 + 0 read1
6477719 + 0 read2
11999942 + 0 properly paired (92.62%:-nan%)
12234952 + 0 with itself and mate mapped
20706 + 0 singletons (0.16%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
from the above result
1) which value should i take for number of reads mapped to genome, while calculating RPKM.
2) The total read counts is 11851490 and how does it increase to 12955438, 12255658,
Thank you. can you please tell me the difference between 12255658 + 0 mapped (94.60%:-nan%) and 12955438 + 0 paired in sequencing
12955438 is the number of entries, whether aligned or not. The other number is the percent aligned.
so while calculating RPKM, will it be correct or meaningful if i take total number of reads mapped to genome as 12255658?
can you please tell me the difference between 11999942 + 0 properly paired (92.62%:-nan%) and 12234952 + 0 with itself and mate mapped.
What is the difference between mapped reads and properly paired reads?
Using 12255658 would yield values that are artificially small. The two numbers you referenced are for proper pairs and that plus discordant pairs (e.g., wrong relative orientation).
Since you asked:
Mapped reads are reads that found a match on the reference sequence given the allowed mismatches / indels and all other restraints that you applied
Proper pairs are pairs of reads that both map and are within the insert size (which is a property of the sequencing library that you should know / have received with the data / have inferred by the TLEN field of the bam file resulting from the alignment of a subset of reads