BWA missing reads in output
0
0
Entering edit mode
9.0 years ago

I'm having trouble getting BWA mem (v0.7.12) to consistently output all reads in the input fastqs. I'm working on a simulation study, and I'm generating smallish sets of fake reads, then aligning them with bwa. The paired input fastqs have a few thousand reads each. The missing reads are all from the ends of the input fastq files as far as I can tell, and subsetting the input fastqs to focus on reads missing from a previous attempt can make the reads show back up again. It looks a little like under some circumstances BWA is forgetting to flush a buffer, or something like that.

Since I'm simulating reads I know exactly where they should go, and I give them each unique ids so I can verify which ones exist in the fastqs and the output sams. The behavior seems identical on both linux and mac.

Are there circumstances under which BWA will refuse to align / output a read, or set of reads? Everything in the fastqs should end up in the .sam, right? Anyone else notice anything similar?

BWA alignment • 2.9k views
ADD COMMENT
0
Entering edit mode

output wc -l of fastq and samtools flagstat of bam file.

ADD REPLY
0
Entering edit mode

Here's a typical run for me:

bofallon@limital:/tmp-working/$ grep '^@' input_r1.fq | wc -l
1250

bofallon@limital:/tmp-working/$ grep '^@' input_r2.fq | wc -l
1250

bofallon@limital:/tmp-working/$ samtools flagstat input_r1.fq.bam
  1222 + 0 in total (QC-passed reads + QC-failed reads)
  0 + 0 secondary
  0 + 0 supplementary
  0 + 0 duplicates
  1222 + 0 mapped (100.00%:-nan%)
  1222 + 0 paired in sequencing
  611 + 0 read1
  611 + 0 read2
  1222 + 0 properly paired (100.00%:-nan%)
  1222 + 0 with itself and mate mapped
  0 + 0 singletons (0.00%:-nan%)
  0 + 0 with mate mapped to a different chr
  0 + 0 with mate mapped to a different chr (mapQ>=5)

1250 input read pairs... and 1222 output read pairs.

(I should mention that with this simulated data no base qualities are '@', so the grepping above accurately counts reads - this won't be true for real data)

ADD REPLY
0
Entering edit mode

What command line(s) did you use to go from fastq to bam?

ADD REPLY

Login before adding your answer.

Traffic: 2332 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6