How many CPUs should I use in my SLURM job for my BLAST?
1
2
Entering edit mode
19 months ago
M. ▴ 40

Hello,

I am trying to perform a blastp search of 23 sequences against a BLAST database containing approximately 15,000 sequences (in a for loop that will give me seperate outputs). As a newcomer to HPC systems, I am unsure how many CPUs or how much memory I will need for this task. Can you advise me on how to assign these parameters correctly? Is there a way to determine the optimal values or will my intuition improve with experience?

blast slurm hpc cpu • 1.5k views
ADD COMMENT
1
Entering edit mode

If your HPC uses a job scheduler then submit the jobs via that mode. You don't say what is the size of the database and query but assuming they are not like nt/nr you may be able to get away with 8 cores and 30G of RAM. You may simply need to run a few jobs and try things out.

ADD REPLY
0
Entering edit mode

I don't know if there is a job scheduler, I'll look at it. I don't exactly remember but my database was ~30 megabytes. Since it's a small one with 4 cores and 32 GB of ram handled my job in less than 2 min. I gave query sequences one by one with a for loop and get separate outputs. I didn't have the chance to try this with databases of different sizes, but I'll try different scenarios and will learn from them. Thank you for your answer!

ADD REPLY
2
Entering edit mode
19 months ago

RAM needs depend on lengths of sequences as well.

Your database is small (you can see the file size). So just over-provision - based on the standard node size in your cluster.

Normally you should avoid doing a for loop... because that will require more I/O and Cpu than doing all as a batch. (but here your numbers are quite small, so the cost of extra cpu cycles will be negligible)

ADD COMMENT
0
Entering edit mode

Thank you for your answer. I understand using loops is not good beacause of the computational cost but how should I do tasks like this when the numbers are not this small?

ADD REPLY
0
Entering edit mode

The way BLAST is engineered, people typically run one multithreaded ("batch") BLAST analysis (i.e., with one multi-Fasta format query file). You probably want to benchmark some of your analyses to see how the RAM/CPU needs scale.

ADD REPLY
0
Entering edit mode

Oh, okay. I'll try that. Thank you for your help!

ADD REPLY

Login before adding your answer.

Traffic: 2577 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6