I have snakemake Snakefile for an ATAC-seq pipeline that I have been using for ages and I'm trying add a Snakerule to create bigwig files from bam files. I'm using the bamCoverage
program from the Deeptools
package to do this.
The snakemake -np
output works fine, and the code snakemake spits out for each iteration of bamCoverage works fine if I run it manually, i.e. not via Snakemake:
bamCoverage -b bam_files/ATAC24_3_fetalMG.srtd.noMit.bam -o \
big_wig_files/ATAC24_3_fetalMG.srtd.noMit.RPKM.bin10.bw \
--outFileFormat bigwig -p 6 --ignoreDuplicates --normalizeUsing RPKM \
--blackListFileName /home/c1477909/blacklist_files/hg19.blacklist.bed \
--binSize 10 --extendReads
The snakemake error I get is this:
RuleException:
CalledProcessError in line 102 of ATAC_24to31_foetal_hMG_May19/Snakefile:
Command ' set -euo pipefail; {code posted above is here}
' returned non-zero exit status 127.
File "/c8000xd3/big-c1477909/foetal_hMG_analysis/ATAC_24to31_foetal_hMG_May19/Snakefile"
line 102, in __rule_deeptools_make_bigwigs
File "/home/.conda/envs/snakemake/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
This seems to be cause by the thread.py
script, I've had a look at this but not 100% what is does. I assume it has something to do with setting cores/threads for the job, and have messed around with changing the thread setting within the snakerule and the actual code but keep getting the same error. The line 102
that the error refers to in the Snakemake script is the last line of the rule shown below.
This is the code for the snakerule:
rule deeptools_make_bigwigs:
input:
rules.remove_mit_reads.output
output:
"big_wig_files/{sample}.srtd.noMit.RPKM.bin10.bw"
threads: 6
log:
"logs/deeptools_make_bigwigs/{sample}.log"
shell:
"bamCoverage -b {input} -o {output} --outFileFormat bigwig "
"-p 6 --ignoreDuplicates --normalizeUsing RPKM "
"--blackListFileName /home/blacklist_files/hg19.blacklist.bed "
"--binSize 10 --extendReads"
And I use a cluster_config file to individually alter the number of cores/threads set for each job sent to the cluster:
__default__:
num_cores: 1
maxvmem: 5G
fastqc:
num_cores: 1
maxvmem: 8G
bowtie2:
num_cores: 8
maxvmem: 4G
sort_bam:
num_cores: 8
maxvmem: 6G
deeptools_make_bigwigs:
num_cores: 6
maxvmem: 4G
homer_annotation:
num_cores: 6
maxvmem: 5G
homer_motif_analysis:
num_cores: 6
maxvmem: 4G
I run everything to do with this script in a conda virtual environment and used conda to install all my packages, which all appear to be compatible. I have a feeling this is something simple but I'm just not seeing what it is.
Any suggestions would be greatly appreciated.
Does whatever scheduler you're using export paths and other environment variables? Is deepTools in the same environment as snakemake?
Yes and yes. All the packages I use for this are in a specific environment for this script. The environment the script is in when it is sent to the scheduler is taken as the default environment with variables etc. I have about 12 separate rules using similar path/environmental variables, so assumed it was an issue with how I was assigning cores rather than an issue with deeptools/snakemake.