I have 3 Assemblies (MT, MG and Combined) and I mapped 2 different sets of reads (MT / MG) to each of these assemblies. I noticed that the number of "total reads" that is reported for the resulting bam files varies, even though I originally used the same number of reads.
For the mapping I used BWA mem for paired reads and for singletons and then merged the 2 resulting files, before sorting and indexing:
samtools merge -@ ${THREADS} -f $PREFIX.merged.bam \
<(bwa mem -v 1 -t ${THREADS} -M -R \"$SAMHEADER\" ${ASSEMBLY} ${PE1} ${PE2} | \
samtools view -@ ${THREADS} -bS -) \
<(bwa mem -v 1 -t ${THREADS} -M -R \"$SAMHEADER\" ${ASSEMBLY} ${SE} | \
samtools view -@ ${THREADS} -bS -)
I counted the reads in the resulting bam files (also with flagstat and idxstats) and got the following numbers:
parallel "echo {};samtools view -c {}" ::: *.bam
MT reads in fastq files combined: 3236023
MT_vs_MT.sorted.bam
3236025
MT_vs_MG.sorted.bam
3236593
MT_vs_Combined.sorted.bam
3236408
MG reads in fastq files combined: 19920596
MG_vs_Combined.sorted.bam
19961083
MG_vs_MT.sorted.bam
19920896
MG_vs_MG.sorted.bam
19951246
Why is the number of total reads in the alignment files different to the original number and between the files?
Thank you in advance.
How many are marked as secondary and or supplementary?
Indeed, some reads can map equally well to multiple locations, causing this discrepancy.
supplementary: samtools view -c -f 2048 : 0 everywhere
non primary: samtools view -c -f 256
And yes these numbers add up to the total read numbers... Thanks for your answer. How do I close this or mark as accepted?