Simple Question: What Is "Large" For Bwa?
2
3
Entering edit mode
12.2 years ago
Ketil 4.1k

I'm using bwa to align reads, and I have to choose the indexing method to use. The documentation says to use -a is for small genomes, and -a bwtsw for large genomes. I've used is, but sometimes this will crash with a segmentation fault, and bwtsw seems to work. Surely there is a better way to decide this than by trial and error? At what size is a genome "large", meaning I should use bwtsw? What is really the difference here?

bwa alignment • 7.8k views
ADD COMMENT
8
Entering edit mode
12.2 years ago
Vikas Bansal ★ 2.4k

From BWA manual

 IS     is moderately fast, but does not work with database larger than 2GB.

bwtsw Algorithm implemented in BWT-SW. This method works with the whole human genome, but it does not work with database smaller than 10MB and it is usually slower than IS.

ADD COMMENT
5
Entering edit mode
12.2 years ago
matted 7.8k

It's not documented online as far as I can tell, but bwa 0.6.2 will choose the correct method automatically if you don't specify an algorithm.

The command line help hints at this:

Usage:   bwa index [-a bwtsw|is] [-c] <in.fasta>

Options: -a STR    BWT construction algorithm: bwtsw or is [auto]

Looking in the code, the decision is made by looking at the size of the reference genome:

// simplified sample from bwtindex.c:
if (algo_type == "auto") {
    if (l_pac > 50000000)
        algo_type = "bwtsw";
    else
        algo_type = "is";
}

I believe l_pac is the total number of bases in the reference (but the code is pretty dense so I'm not positive).

ADD COMMENT

Login before adding your answer.

Traffic: 1530 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6