Number of reads mapped to genome
1
0
Entering edit mode
8.0 years ago
vimlakany • 0

Hi, while calculating RPKM, how to get the number of reads mapped to genome. The total read counts is 11851490
I have tried samtools flagstat file.bam, the result is

12955438 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
12255658 + 0 mapped (94.60%:-nan%)
12955438 + 0 paired in sequencing
6477719 + 0 read1
6477719 + 0 read2
11999942 + 0 properly paired (92.62%:-nan%)
12234952 + 0 with itself and mate mapped
20706 + 0 singletons (0.16%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

from the above result
1) which value should i take for number of reads mapped to genome, while calculating RPKM.
2) The total read counts is 11851490 and how does it increase to 12955438, 12255658,

RNA-Seq • 3.1k views
ADD COMMENT
4
Entering edit mode
8.0 years ago

Please do samtools view -c -F 256 -f 66 file.bam and use the number it outputs as the number of fragments. You can then use that for the FPKM (not RPKM, since you have a paired-end dataset) calculation.

The reason there are more entries than original reads in the BAM file is due to secondary alignments.

As an aside, make sure you have a good reason to use RPKMs/FPKMs, since for the most part they should be avoided.

ADD COMMENT
0
Entering edit mode

Thank you. can you please tell me the difference between 12255658 + 0 mapped (94.60%:-nan%) and 12955438 + 0 paired in sequencing

ADD REPLY
0
Entering edit mode

12955438 is the number of entries, whether aligned or not. The other number is the percent aligned.

ADD REPLY
0
Entering edit mode

so while calculating RPKM, will it be correct or meaningful if i take total number of reads mapped to genome as 12255658?
can you please tell me the difference between 11999942 + 0 properly paired (92.62%:-nan%) and 12234952 + 0 with itself and mate mapped.
What is the difference between mapped reads and properly paired reads?

ADD REPLY
0
Entering edit mode

Using 12255658 would yield values that are artificially small. The two numbers you referenced are for proper pairs and that plus discordant pairs (e.g., wrong relative orientation).

ADD REPLY
0
Entering edit mode

Since you asked:

  • Mapped reads are reads that found a match on the reference sequence given the allowed mismatches / indels and all other restraints that you applied

  • Proper pairs are pairs of reads that both map and are within the insert size (which is a property of the sequencing library that you should know / have received with the data / have inferred by the TLEN field of the bam file resulting from the alignment of a subset of reads

ADD REPLY

Login before adding your answer.

Traffic: 2562 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6