Entering edit mode
13.2 years ago
Semenko
▴
120
I've been using MegaBLAST (in BLAST+) to test for high-similarity matches against hg18. It's exponentially faster than normal blastn for this purpose, but I'm not sure what settings I should use when building MegaBLAST databases.
Example commands:
$ makembindex -input hg18 -output hg18 -iformat blastdb
$ blastn -task blastn -db hg18 -query input.fa -evalue 1e-05 -num_descriptions 1 -num_alignments 1 -outfmt 6
$ blastn -db hg18 -use_index true -query input.fa -evalue 1e-05 -num_descriptions 1 -num_alignments 1 -outfmt 6
The -task blastn
command runs in ~2 hours, while the indexed (MegaBLAST) command runs in ~3 minutes.
The makembindex command has a lot of options I'm not using -- can I prune the large (~10+ GB) MegaBLAST index by choosing a different nmer
? Or are there important tweaks to stride
?
-nmer nmer_size
default: 12
N-mer size to use. This parameter is ignored if -legacy true is
specified.
-stride stride
default: 5
makembindex will index every stride-th N-mer of the database.
(see also ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/indexed_megablast/README.usage)