put this in your script and play with it to test slurm

Question

Diamond blastx: does it support SLURM?

0

Entering edit mode

3.2 years ago

emiliomastriani ▴ 40

Hello, I have a cluster composed by 7 nodes (20 threads), running on SLURM (Ubuntu 20.04). I am trying to run the diamond blastx process on the cluster, but it just runs N-times the same process ... Please, can someone tell me what I am missing? Thank you

#!/bin/bash
#
#SBATCH --job-name=DiamondBlastx
#SBATCH --output=DiamondBlastx_%j.out
#SBATCH --error=DiamondBlastx_%j.err
#SBATCH --ntasks=20

threads=20

srun diamond blastx \
                    -d /storage/RefSeq/refseq_protein_nonredund_diamond.dmnd \
                    -p $threads \
                    -q /storage/homehpc/emastriani/rRNACont/reads_not_rRNA.fq \
                    -a blastx_diamond \
                    -t $TMPDIR \
                    -b 1.0 \
                    -k 1 \
                    -v \
                    --log

hpc diamond sbatch slurm • 1.9k views

ADD COMMENT • link updated 3.2 years ago by colindaven 7.0k • written 3.2 years ago by emiliomastriani ▴ 40

0

Entering edit mode

composed by 7 nodes (20 threads)

That does not make sense. Most modern CPU's have multiple cores (and each core in turn supports 2 threads example of intel Xeon CPU's). Each server node (commonly) has 2 sockets/CPU's (4 socket servers are available and are significantly more expensive).

So if you actually have 7 physical nodes (servers) then they are unlikely to be limited to 20 threads.

but it just runs N-times the same process

What does that mean? DIAMOND job above should run on -p threads/cores.

ADD REPLY • link 3.2 years ago by GenoMax 148k

0

Entering edit mode

Let me try to be more clear. On my cluster 4 machines are dual-core (they can run 8 processes) and 3 machines are quad-core (they can run 12 processes), so the total of my CPUs on the cluster is 20. This is the reason why I set ntask=20 (SBATCH section) and -p 20 (diamond blastx parameter). As usual, I was expecting the process executed by srun would "distribute" the workload over the available resources (the mentioned 20 cpus), but it is not so. I monitored the job execution node by node, and all of them seem to be the same process running on every cpu. And also, if I run the process on one node (nodes=1) with four threads (p=4), it works (slowly, but works). If I run the process over more than 1 node, finally it will fail, killed by srun because of OUT OF MEMORY, even if the b parameter remained unchanged (-b 1.0).

I hope you can help me.

Thank you

ADD REPLY • link 3.2 years ago by emiliomastriani ▴ 40

0

Entering edit mode

To be more precise:

On 4 machines I have AMD G-T56N Processor, where Thread(s) per core: 1 Core(s) per socket: 2 Socket(s): 1 It means only 2 threads are allowed to run simultaneously.
On 3 machines I have AMD Embedded G-Series GX-420GI Radeon R7E, where Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 It means only 4 threads are allowed to run simultaneously.

The total of threads allowed to run simultaneously on my cluster is 20.

ADD REPLY • link 3.2 years ago by emiliomastriani ▴ 40

0

Entering edit mode

Number of threads is one thing but DIAMOND requires a significant amount of RAM as well. I am not sure how large your database is but this sounds like a difficult task for the set up your describe. I am not sure what kind of interconnect (are you using a GB ethernet) you are using to connect these machines but that would also be a major bottleneck.

ADD REPLY • link 3.2 years ago by GenoMax 148k

score 0 · Answer 1 · 2021-10-15

Try this - using srun -c, NOT ntasks

Note ntasks will start the search 20x, which is not what you want

Note that if you have tiny machines with 4 or 8 cores, reduce threads to like 4 to see if they will start running. 20 means 20 threads on one server!

You seem to have misconception that SLURM will allow you to split up a single big diamond job onto many PCs. This is not the case. SLURM will let you run, say 50 jobs on your architecture. Those that cannot run now (no threads available) will wait until a machine becomes available.

Also note the RAM requirements. Also experiment with sleep jobs to get a feel how slurm works.

put this in your script and play with it to test slurm

srun -c 1 sleep 10 &
squeue 
srun -c 2 sleep 10 &
srun -c 4 sleep 10 &
srun -c 8 sleep 10 &



#!/bin/bash
#
#SBATCH --job-name=DiamondBlastx
#SBATCH --output=DiamondBlastx_%j.out
#SBATCH --error=DiamondBlastx_%j.err


threads=20

srun -c $threads diamond blastx \
                    -d /storage/RefSeq/refseq_protein_nonredund_diamond.dmnd \
                    -p $threads \
                    -q /storage/homehpc/emastriani/rRNACont/reads_not_rRNA.fq \
                    -a blastx_diamond \
                    -t $TMPDIR \
                    -b 1.0 \
                    -k 1 \
                    -v \
                    --log