Question

Megablast, Makembindex, And Choosing Nmers

6

Entering edit mode

13.9 years ago

Semenko ▴ 120

I've been using MegaBLAST (in BLAST+) to test for high-similarity matches against hg18. It's exponentially faster than normal blastn for this purpose, but I'm not sure what settings I should use when building MegaBLAST databases.

Example commands:

$ makembindex -input hg18 -output hg18 -iformat blastdb
$ blastn -task blastn -db hg18 -query input.fa -evalue 1e-05 -num_descriptions 1 -num_alignments 1 -outfmt 6
$ blastn -db hg18 -use_index true -query input.fa -evalue 1e-05 -num_descriptions 1 -num_alignments 1 -outfmt 6

The -task blastn command runs in ~2 hours, while the indexed (MegaBLAST) command runs in ~3 minutes.

The makembindex command has a lot of options I'm not using -- can I prune the large (~10+ GB) MegaBLAST index by choosing a different nmer? Or are there important tweaks to stride?

From http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/algo/blast/dbindex/makeindex/README.usage:

-nmer nmer_size
    default: 12
    N-mer size to use. This parameter is ignored if -legacy true is
    specified.

-stride stride
    default: 5
    makembindex will index every stride-th N-mer of the database.

(see also ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/indexed_megablast/README.usage)

blast • 6.8k views

ADD COMMENT • link 13.9 years ago by Semenko ▴ 120