Hi All, I am trying to run samtools flagstat to determine whether my bam files contain duplicates. I do not have the fastqc files for these bams. They were already aligned, using BWA I believe and sent over. The samtools flagstat statistics before and after running the mark duplicates tool in Picard are as follows.
Samtools flagstat before running Mark duplicates
54020790 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
51506397 + 0 mapped (95.35%:-nan%)
54020790 + 0 paired in sequencing
27010395 + 0 read1
27010395 + 0 read2
50485840 + 0 properly paired (93.46%:-nan%)
50592454 + 0 with itself and mate mapped
913943 + 0 singletons (1.69%:-nan%)
63320 + 0 with mate mapped to a different chr
48093 + 0 with mate mapped to a different chr (mapQ>=5)
Samtools flagstat after running Mark duplicates in as follows
54020790 + 0 in total (QC-passed reads + QC-failed reads)
2723962 + 0 duplicates
51506397 + 0 mapped (95.35%:-nan%)
54020790 + 0 paired in sequencing
27010395 + 0 read1
27010395 + 0 read2
50485840 + 0 properly paired (93.46%:-nan%)
50592454 + 0 with itself and mate mapped
913943 + 0 singletons (1.69%:-nan%)
63320 + 0 with mate mapped to a different chr
48093 + 0 with mate mapped to a different chr (mapQ>=5)
So based on these results I am assuming that in the earlier bam file the duplicates had not been marked. Am I right in assuming this. Also if someone could provide a simple explanation of how to interpret the samtools flagstat tools it would be really helpful. Thanks
Thank you so much, That was my major confusion since I only received bam files not the fastqc ones.