Hi, Recently I've been doing the alignment of WGBS data. After mapping to the genome using BSMAP, I found a lot of aligned reads are not properly mapped from the samtools flagstat results :
578132580 + 0 in total (QC-passed reads + QC-failed reads)
30679544 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
578132580 + 0 mapped (100.00% : N/A)
547453036 + 0 paired in sequencing
285250897 + 0 read1
262202139 + 0 read2
61623 + 0 properly paired (0.01% : N/A)
418641102 + 0 with itself and mate mapped
128811934 + 0 singletons (23.53% : N/A)
396746190 + 0 with mate mapped to a different chr
396746190 + 0 with mate mapped to a different chr (mapQ>=5)
According to the last 2 lines above, I had so many unpaired reads whose mate mapped to a different chr. I'm quite worried about this consequence. As far as I know , also from this post : filtering paired end mapped reads form SAM/BAM file , this may happen due to chromosomal rearrangements(e.g. in cancer samples), artifacts introduced in library prep. or poor mapping quality. But my samples are not from cancer cells or tissues, as well as the last line in flagstat results tells me the it's not due to the poor reads quality. So my question is should I remove or keep those unpaired(improperly mapped) reads ? What's the reason that so many unpaired reads exist ? Looking forward to your kindly help. Thank you so much !
Is it mate-pair dataset?
As you can see from this line :
I think it's a mate-pair dataset. :)
Not necessarily, pair-end and mate-pair are two different techniques. Your problem can be caused by library preparation, wrong aligner settings. Not sure about flagstat "different chr". If you check a pair of reads with the same ID, are they really mapped to two different chromosomes? If chromosome is the same, which distance you get between these reads?
What is the quality of your reference genome? Is it human, or some other?
The reference genome is downloaded from UCSC hg19 genome.
Honestly, it seems something unpaired the R1 and R2 reads. What were your processing steps before mapping? How did you remove adapters, trim quality, etc?