Good day.
I encountered a situation like below:
The flagstat before PCR duplicates removed from paired end mapped reads.
::::::::::::::
0H.flagstat.txt
::::::::::::::
173146136 + 0 in total
0 + 0 duplicates
130510023 + 0 mapped (75.38%:nan%)
173146136 + 0 paired in sequencing
86573068 + 0 read1 <--
86573068 + 0 read2 <--
87873910 + 0 properly paired (50.75%:nan%)
87873910 + 0 with itself and mate mapped
42636113 + 0 singletons (24.62%:nan%)
The flagstat information after PCR duplicates removed with Picard MarkDuplicates tool from paired end mapped reads.
::::::::::::::
0H.ptFlagstat.txt
::::::::::::::
49080460 + 0 in total
0 + 0 duplicates
6444347 + 0 mapped (13.13%:nan%)
49080460 + 0 paired in sequencing
45547041 + 0 read1 <--
3533419 + 0 read2 <--
5822436 + 0 properly paired (11.86%:nan%)
5822436 + 0 with itself and mate mapped
621911 + 0 singletons (1.27%:nan%)
The number mapped of read1 and read2 is different after the PCR duplicates were removed. Anyone here has the same situation?
I'm confused with these "paired in sequencing" and "properly paired" phrases, could anyone kindly please share with me your thoughts. The number shown for these two phrases are different.
Thanks Mikael for the explanation.
I mapped the SOLiD csfasta reads against pseuodogenome (a collection of EST sequences of the Genus) as there is no complete genome available. It is a non-model plant species. I used SHRiMP2 to carry out the mapping task with --half-paired option on (default is on as of v2.2.0).