I have Snakemake hooked up to an S3 account, and I'm wanting to delete certain temp() files after processing our pipeline.
I have a rule that designates certain files as temp(). Below is an example of one:
#Split rep element mapped bam file into subfiles rule split_rep_bam: input:
'rep_element_pipeline/{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam' output:
temp('rep_element_pipeline/AA.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp'),
temp('rep_element_pipeline/AC.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp'),
temp('rep_element_pipeline/AG.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp'),
temp('rep_element_pipeline/AN.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp'),
temp('rep_element_pipeline/AT.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp'),
temp('rep_element_pipeline/CA.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp'),
temp('rep_element_pipeline/CC.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp'),
temp('rep_element_pipeline/CG.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp'),
temp('rep_element_pipeline/CN.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp'),
temp('rep_element_pipeline/CT.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp'),
temp('rep_element_pipeline/GA.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp'),
temp('rep_element_pipeline/GC.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp'),
temp('rep_element_pipeline/GG.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp'),
temp('rep_element_pipeline/GN.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp'),
temp('rep_element_pipeline/GT.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp'),
temp('rep_element_pipeline/NA.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp'),
temp('rep_element_pipeline/NC.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp'),
temp('rep_element_pipeline/NG.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp'),
temp('rep_element_pipeline/NN.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp'),
temp('rep_element_pipeline/NT.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp'),
temp('rep_element_pipeline/TA.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp'),
temp('rep_element_pipeline/TC.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp'),
temp('rep_element_pipeline/TG.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp'),
temp('rep_element_pipeline/TN.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp'),
temp('rep_element_pipeline/TT.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp') conda:
'../envs/rep_element.yaml' params:
fp=full_path shell:
'perl ../scripts/split_bam_to_subfiles.pl {input[0]} '
'{params.fp}/rep_element_pipeline/'
I've been running Snakemake as such:
snakemake --default-remote-provider S3 --default-remote-prefix '$s3' --use-conda --cores 32 --rerun-incomplete --printshellcmds --delete-temp-output
Where I've specified the --delete-temp-output option.
When running this rule, it appears snakemake is deleting these files. However, these files still persist in S3. Does anyone know why these files aren't being deleted in S3? The temp() files appear to be deleted locally on my machine, but are uploaded to S3.
Building DAG of jobs...
Deleting 20211222-rp030/rep_element_pipeline/AA.APP-01_rep_1_Ribo_R1.fastq.gz.mapped_vs_bt2_hg38.sam.tmp
Deleting 20211222-rp030/rep_element_pipeline/AC.APP-01_rep_1_Ribo_R1.fastq.gz.mapped_vs_bt2_hg38.sam.tmp
I would really just like for my temp() files not to be uploaded to S3, as they take up a lot of disk space.
cross posted: https://stackoverflow.com/questions/70702438/snakemake-temp