Hello there!
I need your input regarding an issue im experiencing when i am running my snakemake pipeline on the university cluster.
So, I made a snakemake pipeline that has a lot of steps and deals with different kind of data (both metagenomes and single isolates). It can take as input raw reads or assembled genomes and then perform a number of different analysis (depending on the input data) like trim,quality check,assembly,renaming of headers,resistance identification etc etc etc.
I am trying to make a tool that it is easy for other people to use, so I tried implementing snakemake's ability to make use of cluster engines. I have created a snakemake profile that configures and submits jobs to the cluster along with various snakemake parameters. My config file is the following:
cluster: qsub -W group_list={cluster.proj} -A {cluster.proj} -l nodes=1:ppn={cluster.core},mem={cluster.vmem},walltime={cluster.time} -e /dev/null -o /dev/null
cluster-config: profile-sge/cluster_computerome.json
restart-times: 3
latency-wait: 10
use-conda: True
jobs: 50
printshellcmds: True
keep-going: True
rerun-incomplete: True
debug-dag: True
verbose: True
show-failed-logs: True
The cluster_computerome.json
file specifies cluster specific information like the numbers of cores used, memory, execution etc per snakemake rule. Looks like this:
"assembly_metagnm_rr" :
{
"core" : "40",
"time" : "40:00:00",
"vmem" : "180G",
"proj" : "cge"
},
When I log in to an interactive node in my university cluster and I execute my pipeline directly there, everything works fine. The workflow runs correctly and with no errors at all. (I use the following command snakemake --use-conda --jobs 10...
)
The issues appear when I use my cluster profile (the profile contains the files that are needed for job submission in the cluster) to execute the workflow. (I use the following command snakemake --profile profile-sge
).
The pipeline crashes very often when it tries to execute different rules. Most of the times it gives me the following error:
Traceback (most recent call last):
File "/services/tools/snakemake/6.9.1/lib/python3.9/site-packages/snakemake/__init__.py", line 699, in snakemake
success = workflow.execute(
File "/services/tools/snakemake/6.9.1/lib/python3.9/site-packages/snakemake/workflow.py", line 1069, in execute
success = self.scheduler.schedule()
File "/services/tools/snakemake/6.9.1/lib/python3.9/site-packages/snakemake/scheduler.py", line 440, in schedule
self._finish_jobs()
File "/services/tools/snakemake/6.9.1/lib/python3.9/site-packages/snakemake/scheduler.py", line 540, in _finish_jobs
self.running.remove(job)
KeyError: rename_headers_si_assemblies
The KeyError: rename_headers_si_assemblies changes regarding the rule execution.
I think that the issue has to do with snakemake trying to see that the output files are indeed created. I have noticed most of the times that before crashing, it gives me the following message:
Waiting at most 10 seconds for missing files.
But when I go and check the output files are there. They are in the correct output folder!
I tried giving more time to snakemake and I increased the latency-wait: 10
time to 30 seconds but that did not changed much.
I would appreciate any ideas on the matter!
Thank you and sorry for the long post.
have you tried an updated version of Snakemake? do you know if both clusters have the same workload manager?
The university cluster admins provide version 6.9.1, which is a bit old I must say...I can ask them to update to a newer version and try again....What do you mean with "both clusters" ?
Thank you for your reply
Crossposted here.