Hi all,
I'm using ncbi-blast-2.2.24+ (on Ubuntu linux) for a sizable all-vs-all blast of protein sequences (530.000 lines of fasta). This is taking quite a while (over an hour) already, so I'm looking into ways to speed it up.
What I've done is run:
ncbi-blast-2.2.24+/bin/makeblastdb -in good_proteins.fasta -dbtype prot -out my_prot_blast_db
followed by:
ncbi-blast-2.2.24+/bin/blastp -db my_prot_blast_db -query good_proteins.fasta -outfmt 6 -out all-vs-all.tsv -num_threads 4
Now firstly: Is this the correct way to do an all-vs-all blast?
And secondly: How can I speed this up?
I added the -num_threads 4
in hopes of making it use all my four processing cores, but it's just alternating in using 100% of one CPU, with the other three near idle. (Being a CS graduate I'm aware of the distinction between cores & threads, but I didn't see any other configuration option that seemed related: http://www.ncbi.nlm.nih.gov/books/NBK1763/)
Possibly thirdly: It is at all reasonable to expect this all-vs-all blast on such a dataset to run in an manageable amount of time, or should I somehow divide this up / move to supercomputers?
(And maybe fourthy: I just chose ncbi-blast because I thought it'd be a good choice, would any other choice be better in handling this case?)
Best regards, Tim
I'll probably have to move in this direction as I was indeed already using the tabular output (-outfmt 6 in the newest blast version).