Hi, I aligned my paired-end sequencing data using Bowtie2 and processed the resulting BAM files with Picard MarkDuplicates. However, I noticed a discrepancy between the unique alignment statistics reported by Bowtie2 and the read count obtained from Samtools.
Bowtie2 alignment statistics:
8536975 reads; of these:
5180282 (60.68%) aligned 0 times (unmapped)
1868933 (21.89%) aligned exactly 1 time (uniquely mapped)
1487760 (17.43%) aligned >1 times (multi-mapped)
Overall alignment rate: 39.32%
Uniquely mapped reads reported by Bowtie2: 1,868,933
Samtools read count after removing duplicates (samtools view -c -q 30 CR05NFYA_S1_mdu.bam
):
2,877,386
Samtools reports ~1M more uniquely mapped reads than Bowtie2.
Why is the uniquely mapped read count in my filtered BAM (_mdu.bam) significantly higher than what Bowtie2 originally reported?
What I Have Tried:
Verified read counts before and after duplicate marking (samtools view -c -F 1024).
Checked MAPQ distribution (samtools view CR05NFYA_S1_mdu.bam | awk '{print $5}' | sort -n | uniq -c
).Excluded secondary alignments (samtools view -c -F 256 CR05NFYA_S1_mdu.bam
).
Show the exact outputs for the samtools flagstat as well.
The verbiage is notoriously confusing and misleading—even the other day, after more than a decade of reading these stats, I got briefly confused by the confounding language they use: for example, alignments are call reads, etc.