for i in *.fasta; do ls *.fasta | parallel -a - blastp -query {} -db mydatabase -evalue 0.00001 -qcov_hsp_perc 50 -outfmt 6 -max_target_seqs 1 -out {.}.xls ; done
It is working on my Mac, however, take 1 whole day to finish a run. I have 44 fasta files in the directory, and I noticed that the blast was actually repeated many times before it stop. May I know are there any alternative for me?
I have 44 fasta files in the directory, and I noticed that the blast
was actually repeated many times before it stop.
It is possible that you are exhausting a hardware resource on your Mac (most likely RAM). Have you made sure that you are able to complete one of these jobs with the database you are using before trying to start many in parallel?
You are listing your files multiple times, then looping unecessarily before trying to parallel-ly run the command. You're at least duplicating the amount of work needed, and at a glance it looks like it may be even worse than that.
Exactly how long it will take under ideal circumstances is not easy to say ahead of time. The process will run faster with fewer, shorter sequences, but it also depends how quickly a good match can be found (better matches can be returned faster).
do us a favour and don't call your output files .xls ;-)
how big are the fasta files (size wise, or # entries in it)
The size is from 1-1.5 MB.
It is possible that you are exhausting a hardware resource on your Mac (most likely RAM). Have you made sure that you are able to complete one of these jobs with the database you are using before trying to start many in parallel?
Thanks for your comment. Actually as mentioned by jrj.healey, removed the loop and it is working well now.