I am running an ATAC-seq pipeline that include rules for LD score regression and Partitioned heritability and I need the execution order of these rules to be sequential.
The ldsr
rule runs fine, but I'm having issue with the partitioned_heritability
rule as I need it to run after all the files in the ldsr
rule have been generated.
In normal circumstances, I use the rules.ldsr.output
function (labelled #1 in the Snakefile) to inform snakemake about rule execution order, as described here, but this is not possible here as the {chr}
wildcard associated with the lsdr
output is not included in the output of the partitioned_heritability
rule, so the {chr}
wildcard cannot be backfilled in the partitioned_heritability
rule input . This means I get the error:
Wildcards in input files cannot be determined from output files: 'chr'
So I tried to use a flag file
in the output of the ldsr
rule and then call this in the input of the partitioned_heritability
rule , see here for info on flag files, but this threw this error:
positional argument follows keyword argument
I thought this was probs because I was using named and unnamed inputs in the input/output calls, so also tried applying variable names to the ldsr
outputs, i.e. touch_file = touch("mytask.done")
and calling rules.ldsr.output.touch_file
in the partitioned_heritability
input but got the same error.
Also tried using check_call
with the flag file
, see #3
, but that didn't work either. Tbh, I have never used the flag_file
or the check_call
options before so may not have understood, or be using, them properly.
Here are the files:
Snakefile:
import os
# read config info into this namespace
configfile: "config.yaml"
rule all:
input:
expand("LD_annotation_files/Bulk.{chr}.l2.ldscore.gz", chr=range(1,23)),
expand("partHerit_files2/PrtHrt_Test.{GWASsumstat}", GWASsumstat = config['SUMSTATS'])
rule ldsr:
input:
annot = "LD_annotation_files/Bulk.{chr}.annot.gz",
bfile_folder = "ldsc/reference_files/1000G_EUR_Phase3_plinkfiles",
snps_folder = "ldsc/reference_files/hapmap3_snps"
output:
"LD_annotation_files/Bulk.{chr}.l2.ldscore.gz",
#2 touch("mytask.done") # Required to enforce rule execution order i.e. PH after lsdr?
conda:
"envs/environment.yml"
params:
bfile = "ldsc/reference_files/1000G_EUR_Phase3_plinkfiles/1000G.EUR.QC.{chr}",
ldscores = "LD_annotation_files/ManuBulk.{chr}",
snps = "ldsc/reference_files/w_hm3.snplist_rsIds"
log:
"logs/LDSC/Bulk.{chr}_ldsc.txt"
shell:
"export MKL_NUM_THREADS=2;" # Export arguments are workaround as ldsr uses all available cores
"export NUMEXPR_NUM_THREADS=2;" # Numbers must match cores parameter in cluster config
"Reference/ldsc/ldsc.py --l2 --bfile {params.bfile} --ld-wind-cm 1 "
"--annot {input.annot} --out {params.ldscores} --print-snps {params.snps} 2> {log}"
#3 check_call(["snakemake", "--snakefile", "mytask.done"])
rule partitioned_heritability:
input:
GWASsumstat = lambda wildcards: config['SUMSTATS'][wildcards.GWASsumstat],
#1 LDSR = rules.ldsr.output
#2 "mytask.done"
output:
"partHerit_files2/PrtHrt_Test.{GWASsumstat}"
conda:
"envs/environment.yml"
params:
weights = "ldsc/reference_files/weights_hm3_no_hla/weights.",
baseline = "ldsc/reference_files/baselineLD_v2.2_1000G_Phase3/baselineLD.",
frqfile = "ldsc/reference_files/1000G_Phase3_frq/1000G.EUR.QC.",
LD_anns = "LD_annotation_files/Bulk."
log:
"logs/PrtHerit/Bulk.PrtHrt.{GWASsumstat}.txt"
shell:
"python ldsc/ldsc.py --h2 {input.GWASsumstat} --w-ld-chr {params.weights}"
"--ref-ld-chr {params.baseline},{params.LD_anns} --overlap-annot"
"--frqfile-chr {params.frqfile} --out {output} --print-coefficients"
-coefficients"
Config File:
SUMSTATS:
ADHD: GWAS_sumstats/munged_sumstats/adhd_LDSC.sumstats.gz
SCZ: GWAS_sumstats/munged_sumstats/scz_LDSC.sumstats.gz
MDD: GWAS_sumstats/munged_sumstats/mdd_LDSC.sumstats.gz
ASD: GWAS_sumstats/munged_sumstats/asd_LDSC.sumstats.gz
BIP: GWAS_sumstats/munged_sumstats/bip_LDSC.sumstats.gz
Any help/advice/solutions with this would be greatly appreciated.
@EricLim - Many Thanks.