Question

When and why is bwa aln better then bwa mem?

16

Entering edit mode

10.9 years ago

dariober 15k

Hi- As the post title says: When and why one should prefer bwa aln over bwa mem?

The bwa docs say that bwa mem is preferable for longer reads ( > 70 bp). But what is the disadvantage of using bwa mem for shorter reads?

Part of the reason I'm asking is that I have a variety of libraries of read lengths from ~40-70 bp to 150 bp, after quality and adapter trimming, mostly paired-end. I'd rather use one tool for all the read lengths to keep things consistent and bwa mem seems the best choice, unless there is some good reason to avoid it for reads between ~40 and 70 bp.

I have the impression (not tested) that bwa mem is much slower than bwa aln on shorter reads, but that's not an issue for me.

Thanks
Dario

aln comparison bwa mem • 51k views

ADD COMMENT • link 4 months ago by dariober 15k

0

Entering edit mode

Hi, I am facing a similar problem.

I have 38bp paired-end ChIP-seq data. Should I use bwa aln or bwa mem?

Thanks,
Ming

ADD REPLY • link updated 4.5 years ago by Ram 45k • written 9.9 years ago by Ming Tommy Tang ★ 4.7k

1

Entering edit mode

answer my own question. I tested using teaser http://teaser.cibiv.univie.ac.at/reports/8dc974f7ce99f6958012619c052e5597/index.html#section4 and bwa aln seems to be a little better than bwa mem for 36bp short single end reads.

ADD REPLY • link updated 4.5 years ago by Ram 45k • written 8.3 years ago by Ming Tommy Tang ★ 4.7k

Ram · Answer 1 · 2014-10-29

22

Entering edit mode

10.9 years ago

Istvan Albert 103k

There is the paper that you should read: http://arxiv.org/abs/1303.3997

But beyond that here is a more practical comparison

We are running a test in a lecture that focuses on alignment performance. For that we have generated 20,000 reads from the Ebola genome with pretty high (10%) sequencing error rates. Then ran bowtie2, bwa aln and bwa mem and attempted to align the reads back to the genome. The mapping rates were:

bowtie2: 30%
bwa aln: 25%
bwa mem: 85%

Of course each of these mapping rates are for default settings that can be changed (see comments down) - but that's where we always start. From those it looks like bwa mem goes a step further and will find alignments where other methods have already given up.

ADD COMMENT • link updated 4.5 years ago by Ram 45k • written 10.9 years ago by Istvan Albert 103k

2

Entering edit mode

Hi Istvan, thanks for reply. However I think you misunderstood my question... From the bwa mem paper, the documentation, and your benchmark it appears that bwa mem is always preferable to bwa aln, especially for longer reads. What I'm asking is: When is bwa aln a better choice over bwa mem? Can we forget about bwa aln altogether and just use bwa mem?

ADD REPLY • link updated 4.5 years ago by Ram 45k • written 10.9 years ago by dariober 15k

4

Entering edit mode

the way I see it my test shows is that bwa mem is far more robust to errors than any other aligner. Length of the reads don't factor into this. Come to think of it these were on the short side 70bp - what wgsim generated by default.

There might be reasons to use aln but I look at it as a prior step that was necessary to get to the new method but in general little reason to keep using it.

In fact the problems caused by misalignment are far more insidious than simply losing some data. Ok so you only get 50% rather than 80% - but no! the reality is far more troubling than that and probably warrants a separate post itself.

An aligner's failure to map a read is not random! What this means is that there is a bias to certain type of errors occurring in certain parts of the genome/reads. SNP calling on the default alignment with bowtie2 generates a large number of seemingly reliably called snps that do are not actully true (since we know what the wgsim simulated genome is). There is a bias towards certain types of errors which makes them look like real signal.

To me this was eye opening - the inability to align is not just data loss - we need to realize that - it may also introduce substantial biases that are then impossible to correct later.

ADD REPLY • link updated 4.5 years ago by Ram 45k • written 10.9 years ago by Istvan Albert 103k

0

Entering edit mode

"Length of the reads don't factor into this" That's what I thought as well... The way the mem docs are written suggested to me that <70bp are not recommendable for bwa mem, hence my post.

ADD REPLY • link updated 4.5 years ago by Ram 45k • written 10.9 years ago by dariober 15k

0

Entering edit mode

Did you try local alignment with bowtie2 (just add --very-sensitive-local)? That's usually the cause of big differences between bwa mem and bowtie2 like you found, though I think bwa mem generally does local alignment better than bowtie2 anyway.

ADD REPLY • link updated 4.5 years ago by Ram 45k • written 10.9 years ago by Devon Ryan 105k

0

Entering edit mode

Making bowtie2 work better for this particular case was a homework due this week and worth 10 extra bonus points. I have not corrected these so I do not know the answer yet :-)

Myself I tried --very-sensitive and that only partially improved to 63% I just ran --very-sensitive-local and that too has about the same 63% much better than the original but still well below bwa mem.

ADD REPLY • link updated 4.5 years ago by Ram 45k • written 10.9 years ago by Istvan Albert 103k

5

Entering edit mode

I have been educated with my students homework. Relaxing the seed mismatches to -N 1 has a substantial effect in this case. Combining that with the parameters that represent the --very-sensitive-local option leads to a bowtie2 mapping rate of 91% Best performing parameter settings:

-D 20 -R 3 -N 1 -L 20

ADD REPLY • link updated 4.5 years ago by Ram 45k • written 10.9 years ago by Istvan Albert 103k

0

Entering edit mode

Those are the same settings needed to make bison, which uses bowtie2 internally, perform the same as bwa-meth, which uses bwa mem internally, on an untrimmed dataset, so that makes sense.

ADD REPLY • link updated 4.5 years ago by Ram 45k • written 10.9 years ago by Devon Ryan 105k

score 6 · Answer 2 · 2025-04-30

~10 years later... I've just come across this post from 2024 https://lh3.github.io/2024/09/28/why-is-bwa-aln-still-used (emphasis mine).

Suppose the true alignment of a 32bp read has one mismatch at position 12 and another mismatch at position 22 on the read. With the default setting, bwa-mem will miss this alignment as it requires at least a 19-mer exact match but the longest exact match on the true alignment is 11bp. In theory, we can find the true alignment if we reduce the minimum seed length to 11. However, each 11-mer occurs 1430 times (=2×3×109/411) on average in the human genome. Bwa-mem will become impractically slow if it extends each occurrence to find the best hit. Bwa-aln on the other hand does not use exact seeds and can guarantee to find the true alignment. It is better for short reads.