When I run blastn -numthreads 6 -task blastn-short -db humangenomic on one 36 bp query it typically takes less than a minute and occupies up to about 5GB. When I gave it a file (using -query) of 606 queries all of the same length, blast took all weekend (8.5 mins each on average). Top suggests it was trying to use 60 gigabytes of memory. Also time says instead of running six times faster (-num_threads 6) it was only effectively using 27% of one CPU.
Is this expected? My plan is in future to split the query file into 6 and run then in series. Does that sound sensible?
(the server has 8 CPUs and 32 MBytes)
Many thanks Bill
ps: the font in this window is too small for me:-(
Update: I tried spliting another file queries into ten files each of either 56 or 57 queries each. Even the slowest of these averaged about three queries per minute. Whilst the fasted did more than 12 per minute.
My computer does 50 sequences per second. I am not sure why the difference is so huge. Where did you get your blast from? Which version is it? Are your files on a network storage? My BLAST has differnent parameter names the yours. Did you formatdb?
Wow. So something is wrong!
blastn was downloaded from ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.2.25+-x64-linux.tar.gz
I am now using files on a local disk but this does not seem to be any faster than using files on a network disk. In my case blastn is CPU limited (unless it runs out of RAM and starts paging).
I used http://www.ncbi.nlm.nih.gov/blast/docs/update_blastdb.pl to download
human_genomic*tar.gz
My reading of the documentation suggested that these files are "preformatted" and so I did not explicitly use formatdbThank you
Bill
So you didnt use formatdb? That my be a reason. Files on your local hard disc can me accessed much faster then network files.