Question

Bowtie Pair-End Broken?

9

Entering edit mode

14.1 years ago

Pablo ★ 1.9k

According to this article in the Journal of Human Genetics (Apr-28, 2011), Bowtie jut doesn't work with pair end reads. For details, see table 2 (Bowtie only maps 0.02% of the reads in pair-end mode, whereas Bwa maps 99.46%), and table 3 (only 24.6% get mapped against 80.4% from bwa).

I always found Bowtie extremely picky when mapping pair-end reads, and usually I use Bwa. Nevertheless, I hear people saying that they use Bowtie all the time.

The question is: According to your experience, do you think that Bowtie is incapable of mapping pair-end reads as this article states or this just a mistake from the authors?

bowtie next-gen sequencing paired • 12k views

ADD COMMENT • link updated 13.8 years ago by brentp 24k • written 14.1 years ago by Pablo ★ 1.9k

0

Entering edit mode

I guess it's safe to say that our experiences disagree with this publication. Most of us had problems with the default insert size in Bowtie, and may be this leads to degraded sensitivity. But for sure never as bad ad they claim.

ADD REPLY • link 14.1 years ago by Pablo ★ 1.9k

0

Entering edit mode

I've contacted the authors pointing them to your comments in this thread. They answered within a few hours with an updated version of the paper. In my opinion, this is a very professional attitude.

ADD REPLY • link 14.1 years ago by Pablo ★ 1.9k

0

Entering edit mode

@Pablo, that's pretty cool they were able (and willing) to update the paper.

ADD REPLY • link 14.1 years ago by brentp 24k

score 16 · Answer 1 · 2011-06-14

16

Entering edit mode

14.1 years ago

brentp 24k

@Brad is right. Though with the default parameters to dwgsim (which was used to generate the paired end reads), if both the documentation and implementations of bowtie and dwgsim were correct, it should have mapped more reads than it did. I wanted to understand why.

I ran a small simulation on a single chromosome. Indeed, using the default parameters for bowtie (as did the paper), there are about 0.1% of reads mapped.

If I increase the max-insert-size to 700 (default is 250), then over 65% of the pairs are mapped. Doing an insert size test on the mapped reads looks like this.

alt text

So, although it told dnaa to use an average outer distance of 300, it actually seems to be generating pairs with an inner distance of 300 since the mean is about 300 + 76 + 76 == 452.

The reason that bwa maps them (even though it already has a default of 500 for the max insert size) is that it actually doesn't use that parameter unless it is unable to guess the insert size on its own from the reads. From the docs

-a INT
Maximum insert size for a read pair to be considered being mapped properly. 
Since 0.4.5, this option is only used when there are not enough good 
alignment to infer the distribution of insert sizes. [500]

So it correctly infers a much larger insert size.

I documented what I did here.

ADD COMMENT • link 14.1 years ago by brentp 24k

2

Entering edit mode

Inferring the insert size distribution while mapping is not straightforward as we have to map the reads first to get the estimate. BWA's batch processing and the 2-stage mapping happen to make the inference much easier. I say "happen to" because this is not a feature in the initial design. BWA did try hard to make the default option work well for various types of input because 1) less experienced users may use wrong settings; 2) I may forget to apply the right options; 3) input of mixed quality need to be processed differently.

ADD REPLY • link 14.1 years ago by lh3 33k

1

Entering edit mode

That should have been caught in review; it's pretty clear the authors didn't bother to look at why bowtie was doing so much worse than other methods.

ADD REPLY • link 14.1 years ago by David Quigley 11k

1

Entering edit mode

Also, on typical simulated data, I am pretty sure that most of the major mappers have very similar sensitivity. The specificity will be very different, but the authors did not measure. On real data of the "standard quality", the mapping sensitivity is typically around 90-96% for major mappers. Also, I think bowtie should use less memory than BWA, at least for single-end. SOAP2's memory is never as bad as 13GB. While I am happy BWA performs well according to their evaluation, I think on sensitivity/speed/memory, others should be as good or even better.

ADD REPLY • link 14.1 years ago by lh3 33k

0

Entering edit mode

Brent -- nice analysis. I didn't realize bwa had a size inference step. This is a great demonstration of the power of default values.

ADD REPLY • link 14.1 years ago by Brad Chapman 9.7k

0

Entering edit mode

Thanks. Yes, and it's also surprising that bowtie has such a default for the maximum insert size (I guess it helps speed).

ADD REPLY • link 14.1 years ago by brentp 24k

0

Entering edit mode

Also, on typical simulated data, I am pretty sure that most of the major mappers have very similar sensitivity. The specificity will be very different, but the authors did not measure. On real data of the "standard quality", the mapping sensitivity is typically around 90-96% for major mappers. Also, I think bowtie should use less memory than BWA, at least for single-end. SOAP2's memory is never as bad as 13GB.

ADD REPLY • link 14.1 years ago by lh3 33k

0

Entering edit mode

@lh3 I always thought that the 2-stage mapping was a design feature to optimize that :-)

ADD REPLY • link 14.1 years ago by Pablo ★ 1.9k

0

Entering edit mode

@Pablo: I used the 2-stage mapping initially because I was lazy...

ADD REPLY • link 14.1 years ago by lh3 33k

0

Entering edit mode

Within hours of hearing of this, Nils Homer updated the docs for dnaa to show that it generates pairs with an inner distance specified by -d, not the outer distance.

ADD REPLY • link 14.1 years ago by brentp 24k

score 5 · Answer 2 · 2011-06-14

5

Entering edit mode

14.1 years ago

Brad Chapman 9.7k

The bowtie default for paired-end maximum insert sizes is rather small, only 250bp:

% bowtie --help | grep maxins
-X/--maxins <int>  maximum insert size for paired-end alignment (default: 250)

Failing to adjust this wih larger insert sizes will lead to low mapping rates.

bwa sampe is a little more generous so with default parameters will do better:

 % bwa sampe
 [...] 
 Options: -a INT   maximum insert size [500]

This also confused me at first, but if you set similar parameters for both the alignment rates will be similar.

ADD COMMENT • link 14.1 years ago by Brad Chapman 9.7k

0

Entering edit mode

I agree, I always change this setting when analysing data with Bowtie.

ADD REPLY • link 14.1 years ago by Pablo ★ 1.9k

score 3 · Answer 3 · 2011-06-14

3

Entering edit mode

14.1 years ago

Josh ▴ 30

In my experience, BWA is better than Bowtie for mapping PE reads, but it isn't as bad as the authors say. If you look at the supplemental data, they use the following command for bowtie: bowtie -t -p 8 -v 2 -a bowtie/hg18 -q ERR008834.filt.fastq >bowtie.map

The -v option makes Bowtie ignore the quality scores and just sets a hard limit on mismatches. BWA on the other hand is a lot more lenient with mapping a second read if the first in the pair maps well.

ADD COMMENT • link 14.1 years ago by Josh ▴ 30

0

Entering edit mode

I agree, I found Bowtie a little bit picky, but not fundamentally broken as they claim.

ADD REPLY • link 14.1 years ago by Pablo ★ 1.9k