Dear Biostar community,
I have problems to run a blastx against nr database. To optimize the process I have read some post which mentioned gnu parallel as a solution to optimize the use of multiple CPUs by ncbi-blast (local mode). Based on different post of this forum I have adapted my code until I stooped to receive error alerts. However, I think the program is not understanding me when I try to modify the predetermined outfmt 6.
There is the chunk of code:
module load gcc
module load ncbi-blast
module load parallel
cat MyTranscriptome.fasta | parallel -q -j 24 --block 100M --recstart '>' --pipe blastx -db /blastdb/nr -num_threads 1 -evalue 1e-5 -outfmt "6 qseqid sseqid stitle pident length mismatch gapopen qstart qend sstart send evalue bitscore" -max_target_seqs 10 -max_hsps 1 > MyTranscriptome_vs_nr.outfmt.6
Before come here to ask you I have serached in to post related with this topic in this (and others) forum but I did not found how to fix the problem.
JUST to clarify (edit), the code ran without problem in a classic way (using -num_threads to my max. numer of cores) so the base code works fine. My problem is the gnu parallel implementation.
Thank you for your time!
I just tried it with some made-up test data and it works fine for me, I get the output that I expected - what is the error you're seeing?
Oh shame on me... I didn't try it on a subset of data and maybe it just show and output when the blast finished. I mean, I have run this command for 12 h to try to calculate the % of the transcriptome blasted and estimate how much time I need. However, my output was an empty file. That makes me go in to a alert mode.
If it runs properly over a test dataset, the command is fine.
Thank you for your time and sorry I forget run a test file after run in panic.
Good to see it works for you :)
why not just
?
Isn't it with single quotes as -
Also, why not simply use threads as Pierre suggested?
All stages of blast are not parallel so OP's approach has potential to be faster. IO might be an issue with 24 threads though..
I had a problem before while using multiple IO while using blast, I thought it was a system-specific problem though :\
If you are using a job scheduler on a cluster to manage these jobs then there is no need to/advantage of using
parallel
.