Megablast, Makembindex, And Choosing Nmers
0
6
Entering edit mode
13.1 years ago
Semenko ▴ 120

I've been using MegaBLAST (in BLAST+) to test for high-similarity matches against hg18. It's exponentially faster than normal blastn for this purpose, but I'm not sure what settings I should use when building MegaBLAST databases.

Example commands:

$ makembindex -input hg18 -output hg18 -iformat blastdb
$ blastn -task blastn -db hg18 -query input.fa -evalue 1e-05 -num_descriptions 1 -num_alignments 1 -outfmt 6
$ blastn -db hg18 -use_index true -query input.fa -evalue 1e-05 -num_descriptions 1 -num_alignments 1 -outfmt 6


The -task blastn command runs in ~2 hours, while the indexed (MegaBLAST) command runs in ~3 minutes.

The makembindex command has a lot of options I'm not using -- can I prune the large (~10+ GB) MegaBLAST index by choosing a different nmer? Or are there important tweaks to stride?

From http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/algo/blast/dbindex/makeindex/README.usage:

-nmer nmer_size
    default: 12
    N-mer size to use. This parameter is ignored if -legacy true is
    specified.

-stride stride
    default: 5
    makembindex will index every stride-th N-mer of the database.

(see also ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/indexed_megablast/README.usage)

blast • 6.7k views
ADD COMMENT

Login before adding your answer.

Traffic: 1398 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6