In simple terms, individual computers in a cluster are nodes. They usually share a disk system, and a head node controls how their jobs are assigned. Most of the time you don't get to choose which exact node will run your job, though you can choose which group of nodes will run your jobs if you are a member of multiple groups. Another way of "choosing" your node is to specify a job configuration that can run only on certain nodes, but that way you are limiting your resources.
Continuing with the same logic, threads of individual computers are tasks. If you ask for a single node and 64 tasks (basically you want your job to run with 64 threads; --nodes=1 --ntasks=64
), and none of your nodes have more than 40 threads, your job will never run because you have specified a configuration that can't be executed. If you specify --nodes=2 --ntasks=64
, your job will run on one node until all of its tasks are occupied (say, 40), and the remaining tasks (say, 24) will run on a second node. If you specify --nodes=2 --ntasks=64 --ntasks-per-node=32
, your job will be evenly split so that each of the two nodes starts 32 threads.
I am almost certain you will get different suggestions from what I will propose below, and you should weigh all of them in consultation with your HPC administrator.
I never run jobs over multiple nodes unless it is absolutely necessary - and even then I don't do it. Scheduling jobs over multiple nodes will typically make your jobs wait longer, because the job is more complex and not as easy to schedule. I am talking here about a reasonably busy cluster - it will make no difference if a small fraction of nodes are running jobs. I run jobs on a single node and ntasks
that is never greater than the number of threads per node. That means your job runs on 40 instead of your desired 64 threads which will make it slower, but it will increase the likelihood that your job will be scheduled immediately instead of waiting for multiple nodes with a specified number of tasks to become free. This also means that you can start a larger number of simultaneous jobs.
To sum: I suggest you find out how many nodes are in your cluster, and the number of threads per node. Then I suggest you submit your jobs like so:
#SBATCH --nodes=1
#SBATCH --ntasks=N
where N =< threads per node. In my experience this is the fastest way to get multiple jobs done. The only time I have benefitted from using --nodes
> 1 was for single jobs that can run on a larger number of threads than what is available on individual nodes. Even in a case like this I sometimes waited 1-2 days for multiple nodes to come completely free for my job. Waiting a long time may negate the gain from having a job run on a large number of threads, and in my experience it is not worth it unless the job is truly long.
Thank you this explanation it was very helpful. I seem to be having an issue with running out of memory. To avoid running out of disk space I've been generating all the sam/bam files in the scratch folder - which I think has worked. But I still seem to be encountering another error during alignment and I believe it is because I am exceeding RAM when several samples are being run at the same time. Here is the report I see:
State: COMPLETED (exit code 0) Nodes: 1 Cores per node: 24 CPU Utilized: 06:36:25 CPU Efficiency: 31.31% of 21:06:00 core-walltime Job Wall-clock time: 00:52:45 Memory Utilized: 19.12 GB Memory Efficiency: 16.60% of 115.22 GB
My memory efficiency is always around ~15% so I'm not sure if why I'm running into this issue. I've been using --nodes=1 --ntasks=24 --partition=shared for the 100 samples on individual sbatch jobs. Do you think shared partition has something to do with it?
The reasons I believe the issues is RAM are based on the Hisat2 error I receive, which Ive seen before when RAM was an issue and because there is no problem when I run the samples not concurrently.
Do you have any suggestions?