trying to use gnu parallel and blast
1
1
Entering edit mode
7.4 years ago
pablo61991 ▴ 90

Dear Biostar community,

I have problems to run a blastx against nr database. To optimize the process I have read some post which mentioned gnu parallel as a solution to optimize the use of multiple CPUs by ncbi-blast (local mode). Based on different post of this forum I have adapted my code until I stooped to receive error alerts. However, I think the program is not understanding me when I try to modify the predetermined outfmt 6.

There is the chunk of code:

module load gcc
module load ncbi-blast
module load parallel

cat MyTranscriptome.fasta | parallel -q -j 24 --block 100M --recstart '>' --pipe blastx -db /blastdb/nr -num_threads 1 -evalue 1e-5 -outfmt "6 qseqid sseqid stitle pident length mismatch gapopen qstart qend sstart send evalue bitscore" -max_target_seqs 10 -max_hsps 1 > MyTranscriptome_vs_nr.outfmt.6

Before come here to ask you I have serached in to post related with this topic in this (and others) forum but I did not found how to fix the problem.

JUST to clarify (edit), the code ran without problem in a classic way (using -num_threads to my max. numer of cores) so the base code works fine. My problem is the gnu parallel implementation.

Thank you for your time!

blast parallel gnu parallel blastx optimization • 5.0k views
ADD COMMENT
1
Entering edit mode

I just tried it with some made-up test data and it works fine for me, I get the output that I expected - what is the error you're seeing?

ADD REPLY
1
Entering edit mode

Oh shame on me... I didn't try it on a subset of data and maybe it just show and output when the blast finished. I mean, I have run this command for 12 h to try to calculate the % of the transcriptome blasted and estimate how much time I need. However, my output was an empty file. That makes me go in to a alert mode.

If it runs properly over a test dataset, the command is fine.

Thank you for your time and sorry I forget run a test file after run in panic.

ADD REPLY
0
Entering edit mode

Good to see it works for you :)

ADD REPLY
0
Entering edit mode

why not just

blastx -num_threads 24 (...)

?

ADD REPLY
0
Entering edit mode

Isn't it with single quotes as -

 -outfmt '6 qseqid sseqid stitle pident length mismatch gapopen qstart qend sstart send evalue bitscore'

Also, why not simply use threads as Pierre suggested?

ADD REPLY
0
Entering edit mode

All stages of blast are not parallel so OP's approach has potential to be faster. IO might be an issue with 24 threads though..

ADD REPLY
0
Entering edit mode

I had a problem before while using multiple IO while using blast, I thought it was a system-specific problem though :\

ADD REPLY
0
Entering edit mode

If you are using a job scheduler on a cluster to manage these jobs then there is no need to/advantage of using parallel.

ADD REPLY
4
Entering edit mode
7.4 years ago
pablo61991 ▴ 90

I have read in this forum a topic discussing how blast use multiple-core resources with this option (link here):

Gnu Parallel - Parallelize Serial Command Line Programs Without Changing Them

In a short search maybe this don't make a big difference, but in a search against nr and using shared resources I need to optimize my search as much as I can. For example:

http://www.ettemalab.org/using-for-loop-vs-gnu-parallel-for-blast/

Author reduce the time need by 1/10 and a search against nr could spend >20 days...

Thank you all for your reply.

ADD COMMENT
0
Entering edit mode

Thank you Pablo for sharing these resources. Parallelizing seems promising for someone, like me, that needs to reduce blastx processing time.

I still have some doubts about the use of -j option. If I correctly understood, in the Ettemalab website -j option is used to run blastjobs in parallel, while in the other Biostars discussion they suggested to use -j to run jobs serially.

Moreover how can I best choose how many blastjobs to be performed? Why did you chose -j 24 in your first post?

Thank you for your help.

ADD REPLY

Login before adding your answer.

Traffic: 1953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6