Hi all,
I'm getting confused with a so basic issue. I have a large query (about 50 MB) that should be exposed to blastx. For simplifying, I split it into several files (say named x00, x01, x02, etc), now I'm not sure about the right command to run the blast job for these queries. Thanks for sharing your commands
-num_threads
would not work?Threading the blast operation doesn't do much because most of the work is still run serially. It only parallelizes some of the overhead, on the assumption each thread will be disk IO bound. To force it and let the OS worry about disk IO, we manually run several instances. You can test it and see you get speedup until 2-4 processes are running, then they slow down regardless of CPU count.
Thanks so much for your prompt reply Ram. I'll try it. I heard from you that the speed of
blastall
for doingblastx
for small query is much better than ncbi-blast+ in your experience. Could you please let us if you have even compered the results of two program for the same query file, they were identical or not?Yes, I did experience that blastall for smaller query sequences was faster than blast+, but that was in 2013, it might not be a valid observation today - blast+ might have been optimized.
I am sorry - I did not compare the results. This was early in my HPC experience, so I was making a ton of mistakes and blast+ was taking too long per learning cycle. Also, I just wanted to get it done and wasn't looking to learn such nuanced matters, sorry.
Thanks Ram.