Entering edit mode
5 months ago
shpak.max
▴
50
I ran bwa mem on a small fastq file (a truncated version of my actual data set) to make sure that everything was in order, e.g.
bwa mem ref.fasta testing_1.fastq testing_2.fastq > testing.sam
samtools view -bS testing.sam > testing.bam
I counted the number of mapped reads in testing.bam:
samtools view -c testing.bam
460050
However, there are 917532 lines in the paired-end fastq files. Dividing this by 4 should give me the number of reads, i.e. 229383, but this quantity is less than half the number of mapped reads according to samtools view, so clearly something is wrong.
In contrast, samtools flagstat indicates that 91.1% of reads were mapped, so I'm not sure how to make sense of the previous mismatched counts.
you have to subtract the number of secondary and supplementary mappings. See option
-F
of samtools view .Running samtools flagstat, I get 458766 primary mappings. This still doesn't come close to accounting for how the number of mapped reads could exceed the total read number. As far as I can tell, dividing the number of lines in a fastq by 4 should give me the total number of reads, which remains much smaller than either the total or primary mapping.
458766 is double the number of your supposed reads (dividing by four the number of lines). Are you sure you have fastq files?