I'm having trouble getting BWA mem (v0.7.12) to consistently output all reads in the input fastqs. I'm working on a simulation study, and I'm generating smallish sets of fake reads, then aligning them with bwa. The paired input fastqs have a few thousand reads each. The missing reads are all from the ends of the input fastq files as far as I can tell, and subsetting the input fastqs to focus on reads missing from a previous attempt can make the reads show back up again. It looks a little like under some circumstances BWA is forgetting to flush a buffer, or something like that.
Since I'm simulating reads I know exactly where they should go, and I give them each unique ids so I can verify which ones exist in the fastqs and the output sams. The behavior seems identical on both linux and mac.
Are there circumstances under which BWA will refuse to align / output a read, or set of reads? Everything in the fastqs should end up in the .sam, right? Anyone else notice anything similar?
output wc -l of fastq and samtools flagstat of bam file.
Here's a typical run for me:
1250 input read pairs... and 1222 output read pairs.
(I should mention that with this simulated data no base qualities are '@', so the grepping above accurately counts reads - this won't be true for real data)
What command line(s) did you use to go from fastq to bam?