Hi,
I'm fairly new to snakemake - I've designed my RNAseq pipeline as a Snakefile. It runs STAR and RSEM and a few shell commands.
One of the steps I wish to implement is to upload a generated BAM file to cloud storage once the pipeline is done computing downstream results. I wish to use rclone
for this, which is very much like rsync for cloud storage locations. However, rsync/rclone do not produce output files. They simply copy/move a file from source to destination. How can I add a snakemake rule that runs rsync/rclone when there is no "output" that rsync/rclone generate? I don't want to use the random content that a redirection would produce as the "output" parameter - it is too unreliable. This should be a simple solution but maybe I am too close to the problem.
Is there a way I can do this:
rule rsync_copy
input:
"{sample}.bam"
output:
??
shell:
"""
rsync -avPe ssh "{input}" "user@remote:/bam_files/{wildcards.sample}/"
"""
This is actually a pretty good solution. I had a discussion offline with a few colleagues and the idea is to add a
rsync ... > rsync_copy.ok && mv rsync_copy.ok $LOG_DIR/
and have$LOG_DIR/rsync_copy.ok
as theoutput
file.On successful completion, the exit code is 0. This part is always verified by snakemake (by always running bash in strict mode), so rule failure on rsync/rclone failure is not a problem.
thats actually interesting to know the exit code would break the rule execution. I think touching ok files is fairly common practice to confirm execution of long running processes on clusters. I would abuse this concept though to bend snakemake to execute rules before dynamic output was handled as well as it is now
Your recommendation is actually 100% the official way to go. Snakemake calls these empty touch-files "Flag files". https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#flag-files They even have a directive "
touch("flag_file")
for this purpose.