Really simple question but for some reason I cannot find a simple answer to this.
We are moving from SGE to SLURM on our HPC. On SGE, we often submit a job with a range of threads, then inside the job, read the number of threads we got and use that to configure multithreaded programs. Example:
$ cat test.sh
#!/bin/bash
echo "got $NSLOTS cpu slots"
$ qsub -pe threaded 4-16 test.sh
Your job 5597119 ("test.sh") has been submitted
$ cat test.sh.o5597119
got 15 cpu slots
What is the SLURM equivalent of this?
I have tried a million variations of sbatch test.sh --cpus-per-task x --ntasks y
across different queues, but nothing seems to work. Its also not clear how to request a range of threads, nor is it clear how to figure out how many you got; I've seen examples online that use $SLURM_NTASKS
and $SLURM_CPUS_PER_TASK
but neither of those variables exist inside any of my jobs. The SLURM docs have been pretty unhelpful.
Option 1: Simple command line SLURM equivalent is:
Option 2: If you want to submit as a script. Save the following in a file
script.sh
.Then submit by doing
Thanks I'm using this with Nextflow so I'll need the
sbatch
format. However this doesn't show how to use a range of threads, or how your script can know how many threads it was givenI am not sure if there is a range option available with SLURM (someone else might chime in on that). If you are looking for
job arrays
then see the example here.-n
is the number of cores assigned to the script. Make sure that number matches of exceeds what you are specifying within your script.I see. Typically, the point is not to specify in the script how many threads, but instead get the number of threads from the amount allocated by the scheduler.
I have always specified a number of cores I want with SGE/LSF/SLURM over the years. There is always a default allocation of cores available per user. Jobs will dispatch right away until that allocation is used up and the rest then pend in queue. Most fair use clusters always reserve enough cores so those submitting small/new jobs will always have slots available. I have never depended on the numbers of cores assigned by a cluster dynamically.
You may wish to talk with your admins to see if the policy you are used to still applies in case of SLURM.
The issue this targets is the situation where a node has e.g. 7 slots available and you requested 8, I'd rather just take the 7 than wait for a full 8 to open up. Similarly when you have a CPU slot quota that does not divide evenly across your resource allocation, let the last jobs in your queue squeeze in with sightly fewer threads, since it could take hours for the other jobs to complete and make room for the full allotment.
While you wait for someone to respond it would be best to check with your local sys admins.
I have moved my answer to a comment since it is not satisfying your exact need for now. Having had access to large clusters for many years I am spoiled.
For SLURM, I think the only resource you can ask a range is
--nodes
. Regarding the environment variables being set ot not, it depends how you run your scripts. For example, the following script:If I run it with
Then
cat res1.out
:Now if I run it with:
Then
cat res2.out
: