Hi, I am surprised to see as low number of posts about megablast indexing... Is this because it does not work? If I believe this one, this should really help to get results faster. But after some trials, I really cannot observe such a good improvement. One potential problem is that the makembindex command results in creating one file less than it says in the output:
creating GG.00.idx
creating GG.01.idx
But only GG.00.idx appeared in the system files. (I tried with 2 computers with different processors with blast+.2.2.25 compiled independently on both machines.
First, I tried to megablast a file against Greengenes and except the fact it took the same time to run, the only difference was that the index megablast charged the RAM 6 to 7 times more than the non-index run. Despite of the potentially missing index file, the blast result was exactly the same (using the UNIX diff command). I made the assumption that indexing improves the speed only for bigger DBs:
So I tried against a huge db, i.e. genbank nt:
############ indexing db
makembindex -input nt -output nt -iformat blastdb
########################## megablast
### index
time blastn -task megablast -use_index true -db nt -query E1.454.fasta.1 -out megaBIGWithIndexNT.blast -evalue 1e-05 -num_descriptions 1 -num_alignments 1 -outfmt 6 > megaBIGWithIndexNT.out&
### without
time blastn -task megablast -use_index false -db nt -query E1.454.fasta.1 -out megaBIGNoIndexNT.blast -evalue 1e-05 -num_descriptions 1 -num_alignments 1 -outfmt 6 > megaBIGNoIndexNT.out&
The results are very bad: - there are less results with indexation - it took 1 day without index, and 3 days with index...
What do you think about that?
Thanks for this first answer DK (and the paper mining!). I would be interested in personal experience to know first if I'm doing right, and second,in which specific cases it is interesting to use that.