Question

Getting Confused With The Flagstat After Pcr Duplicates Removed

1

Entering edit mode

13.3 years ago

KJ Lim ▴ 140

Good day.

I encountered a situation like below:

The flagstat before PCR duplicates removed from paired end mapped reads.

:::::::::::::: 
0H.flagstat.txt
::::::::::::::
173146136 + 0 in total 
0 + 0 duplicates
130510023 + 0 mapped (75.38%:nan%)
173146136 + 0 paired in sequencing
86573068 + 0 read1  <--
86573068 + 0 read2  <--
87873910 + 0 properly paired (50.75%:nan%)
87873910 + 0 with itself and mate mapped
42636113 + 0 singletons (24.62%:nan%)

The flagstat information after PCR duplicates removed with Picard MarkDuplicates tool from paired end mapped reads.

::::::::::::::
0H.ptFlagstat.txt
::::::::::::::
49080460 + 0 in total 
0 + 0 duplicates
6444347 + 0 mapped (13.13%:nan%)
49080460 + 0 paired in sequencing
45547041 + 0 read1  <--
3533419 + 0 read2   <--
5822436 + 0 properly paired (11.86%:nan%)
5822436 + 0 with itself and mate mapped
621911 + 0 singletons (1.27%:nan%)

The number mapped of read1 and read2 is different after the PCR duplicates were removed. Anyone here has the same situation?

I'm confused with these "paired in sequencing" and "properly paired" phrases, could anyone kindly please share with me your thoughts. The number shown for these two phrases are different.

picard pcr duplicates sam bam • 3.8k views

ADD COMMENT • link updated 13.3 years ago by swbarnes2 15k • written 13.3 years ago by KJ Lim ▴ 140

score 1 · Answer 1 · 2012-04-19

Your results do look a bit strange ... as far as I know, the "read1" plus the "read2" value should always equal the "mapped" value. For you, the sum is equal to the "paired in sequencing" value instead. By the way, the read1 and read2 values do not need to be equal, in fact I have never seen it before. (Usually there are never exactly the same number of read1:s aligning as read2:s.)

"Paired in sequencing" is the number of paired reads among the total reads (usually equal to this number, although you could in principle have a mix of paired-end and single-end reads in a BAM/SAM file). "Properly paired" is the number of alignments where the "properly paired" SAM flag is set. This is done by the aligner, so it depends on the aligner how that is defined. Generally, it means that read 1 and read 2 align within some maximum distance of each other and in the correct orientation (if applicable).

score 0 · Answer 2 · 2012-04-19

0

Entering edit mode

13.3 years ago

swbarnes2 15k

I'm not all that clear on what MARKDuplicates does with reads and read pairs where one or both ends don't map.

Maybe if Read 2 mapped much better than Read 1, maybe that's why MarkDuplicates took away so much more of it, and your read 1 data is full of unmapped reads that MarkDuplicates left alone.

You can use samtools view to disect how many read 1 and read 2's are properly paired versus just mapped versus unmapped.

ADD COMMENT • link 13.3 years ago by swbarnes2 15k

0

Entering edit mode

Thanks swbarnes2 for your answer.

Could you kindly please elaborate more about : "You can use samtools view to disect how many read 1 and read 2's are properly paired versus just mapped versus unmapped". Thanks.

I'm still in learning process to master the Samtools.

ADD REPLY • link 13.3 years ago by KJ Lim ▴ 140