Mismatch between number of reads in fastq and mapped reads
0
0
Entering edit mode
5 months ago
shpak.max ▴ 50

I ran bwa mem on a small fastq file (a truncated version of my actual data set) to make sure that everything was in order, e.g.

bwa mem ref.fasta testing_1.fastq testing_2.fastq > testing.sam
samtools view -bS testing.sam > testing.bam

I counted the number of mapped reads in testing.bam:

samtools view -c testing.bam
460050

However, there are 917532 lines in the paired-end fastq files. Dividing this by 4 should give me the number of reads, i.e. 229383, but this quantity is less than half the number of mapped reads according to samtools view, so clearly something is wrong.

In contrast, samtools flagstat indicates that 91.1% of reads were mapped, so I'm not sure how to make sense of the previous mismatched counts.

samtools bam • 397 views
ADD COMMENT
1
Entering edit mode

you have to subtract the number of secondary and supplementary mappings. See option -F of samtools view .

ADD REPLY
0
Entering edit mode

Running samtools flagstat, I get 458766 primary mappings. This still doesn't come close to accounting for how the number of mapped reads could exceed the total read number. As far as I can tell, dividing the number of lines in a fastq by 4 should give me the total number of reads, which remains much smaller than either the total or primary mapping.

ADD REPLY
0
Entering edit mode

458766 is double the number of your supposed reads (dividing by four the number of lines). Are you sure you have fastq files?

ADD REPLY

Login before adding your answer.

Traffic: 1420 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6