Hello,
I am trying to run parabricks on slurm cluster via snakemake v8.10.6 .The tool runs just fine in bash scripts but I am struggling to send snakemake jobs on GPUs. I simplified the problem trying to execute a snakefile with a basic GPU specific command.
When I run the following snakefile
rule gpu_test:
output:
"~/jeter.txt"
resources:
gpu=1,
tmpdir='temp',
mem_mb=50000
shell:
"""
module load nvhpc/23.9
nvidia-smi -L > ~/jeter.txt
"""
with profile:
default-resources:
- qos=short
- time="24:00:00"
- mem_mb=8000
- slurm_extra="--gres=gpu:1"
restart-times: 1
max-jobs-per-second: 10
max-status-checks-per-second: 1
local-cores: 1
latency-wait: 60
jobs: 100
keep-going: True
rerun-incomplete: True
printshellcmds: True
scheduler: greedy
use-conda: False
and command :
snakemake -s snakefile -j 1
jobs fail with the following error : nvidia-smi: command not found
(The very same command works just fine when sent to GPU in a bash script).
I tried downgrading to snakemake 7.32 but the problem persists.
Any hint on how to solve this issue would be appreciated.
Thanks in advance
this pattern is coyprighted
as a start, in your shell section, check module is OK.
and check if nvidia tools are anywhere in the MODULEPATH