I have been running a relatively simple snakemake pipeline that processes bam files and aggregates a variety of metrics. When running it progresses as expected then randomly shuts down. What the stdout/log reports:
...
16 of 39 steps (41%) done
[Wed Oct 9 13:48:45 2024]
Finished job 7.
17 of 39 steps (44%) done
[Wed Oct 9 13:48:45 2024]
Finished job 34.
18 of 39 steps (46%) done
[Wed Oct 9 13:48:56 2024]
Finished job 55.
19 of 39 steps (49%) done
[Wed Oct 9 13:50:06 2024]
Finished job 25.
20 of 39 steps (51%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-10-09T134300.644136.snakemake.log
When searching through the log there is no added information, errors, or tracebacks. If I rerun the command (with the --rerun-incomplete
flag) it picks up at the exact job it quits at and ends up completing successfully. I have adequate CPUs(70+) and RAM(500+GB) so I can't imagine its a resource issue. I have not been able to find any other information online about this.
Any advice or ideas is appreciated!
Did the dry run end correctly? Could you show us the rule that fails?
I will give a dry run a go and report back. That is the strange part - there are no specific rules that fail, the pipeline just quits(usually around 45-55% complete). If I rerun the command it picks back up and will finish as expected.