Hi all,
I have a very basic question here. With a paired-end sequencing, how do we count the number of mapped reads?
I did a flagstat on my file which I have already filtered with the flags 83 & 163 for mapped proper and properly paired.
The flagstat results are as below:
69640 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
69640 + 0 mapped (100.00%:-nan%)
69640 + 0 paired in sequencing
34820 + 0 read1
34820 + 0 read2
69640 + 0 properly paired (100.00%:-nan%)
69640 + 0 with itself and mate mapped
0 + 0 singletons (0.00%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
So from the results, does 69640 means that 69640 reads mapped, or 69640 paired-reads mapped (as in R2---R1 is counted as 1 read)? Should we divide the number with 2 if only one end of the reads were counted?
A bit confused here. Hope someone can help. Thank you very much.
Thanks Pierre for the reply.
But I still don't quite understand. I attached a picture to make my query clearer.
image url
In the picture, the flagstat showed 6 reads; 3 from R1 & 3 from R2. I understand that. But if you view the sam file in a genomic viewer like IGV, the first image appears. But if I view them as pairs, it will look like the second one. So in the second picture, we actually only have 3 fragments of the gene mapped, with each fragment made from a pair of reads. So shouldn't we divide the number of reads by 2 to get the total number of fragments mapped? The number given in flagstat gives the number of individual reads, without taking into consideration the pair, right?
yes, and this number would be
"properly paired" /2
again, the number of 'correct fragments' would be
"properly paired" /2
OK, that clarifies a lot of things.
But then, what about when the "properly paired" number is odd? It's not always an even number, right? For example, R1 can map to 2 different R2 fragments, how do you count those?
I am quite new to sequencing, so I would like to know what is the general consensus on reporting the number of reads in paired-end sequencing? The total number of reads or the one divided by 2 i.e. 'correct fragments'?