Question

blast running for several splitted query

0

Entering edit mode

9.6 years ago

seta ★ 1.9k

Hi all,

I'm getting confused with a so basic issue. I have a large query (about 50 MB) that should be exposed to blastx. For simplifying, I split it into several files (say named x00, x01, x02, etc), now I'm not sure about the right command to run the blast job for these queries. Thanks for sharing your commands

blast alignment RNA-Seq • 2.9k views

ADD COMMENT • link updated 21 months ago by Ram 44k • written 9.6 years ago by seta ★ 1.9k

Ram · Answer 1 · 2015-05-03

2

Entering edit mode

9.6 years ago

Ram 44k

If you have access to an HPC, run an array job - that's the best way to get this done fastest.

Serial processing is to just use a loop:

for num in $(seq 1 10)
do
    blastx input_${num} database.blastdb >output.out
done

You can use GNU parallel or different script files to deal with each BLASTX run.

If you have access to HPC, look up job arrays - these supply an ARRAY_ID or some such iterator variable value to each job in the array, and you can then use this array id to control which input file is used by that job. The command in the HPC script would look like:

blastx input_${ARRAY_ID} database.blastdb >output.out

and the command to submit the script would include the range of the ARRAY_ID variable like so:

qsub -t 1-10 job.pbs #assuming your HPC used PBS

HTH

ADD COMMENT • link 6.2 years ago by Ram 44k

0

Entering edit mode

-num_threads would not work?

ADD REPLY • link updated 6.2 years ago by Ram 44k • written 9.6 years ago by GouthamAtla 12k

0

Entering edit mode

Threading the blast operation doesn't do much because most of the work is still run serially. It only parallelizes some of the overhead, on the assumption each thread will be disk IO bound. To force it and let the OS worry about disk IO, we manually run several instances. You can test it and see you get speedup until 2-4 processes are running, then they slow down regardless of CPU count.

ADD REPLY • link updated 6.2 years ago by Ram 44k • written 9.6 years ago by karl.stamm 4.1k

0

Entering edit mode

Thanks so much for your prompt reply Ram. I'll try it. I heard from you that the speed of blastall for doing blastx for small query is much better than ncbi-blast+ in your experience. Could you please let us if you have even compered the results of two program for the same query file, they were identical or not?

ADD REPLY • link updated 6.2 years ago by Ram 44k • written 9.6 years ago by seta ★ 1.9k

0

Entering edit mode

Yes, I did experience that blastall for smaller query sequences was faster than blast+, but that was in 2013, it might not be a valid observation today - blast+ might have been optimized.

I am sorry - I did not compare the results. This was early in my HPC experience, so I was making a ton of mistakes and blast+ was taking too long per learning cycle. Also, I just wanted to get it done and wasn't looking to learn such nuanced matters, sorry.

ADD REPLY • link 6.2 years ago by Ram 44k

0

Entering edit mode

Thanks Ram.

ADD REPLY • link updated 21 months ago by Ram 44k • written 9.6 years ago by seta ★ 1.9k

Ram · Answer 2 · 2015-05-03

0

Entering edit mode

9.6 years ago

Antonio R. Franco ★ 5.2k

Try using Blast2Go as well. You can run different Blasts and get even more information, such as mapping, domain, etc

ADD COMMENT • link updated 6.2 years ago by Ram 44k • written 9.6 years ago by Antonio R. Franco ★ 5.2k