Snakemake: Force execution of rules in a specific order - ignoring wildcards
1
0
Entering edit mode
4.8 years ago
camerond ▴ 190

I am running an ATAC-seq pipeline that include rules for LD score regression and Partitioned heritability and I need the execution order of these rules to be sequential.

The ldsr rule runs fine, but I'm having issue with the partitioned_heritability rule as I need it to run after all the files in the ldsr rule have been generated.

In normal circumstances, I use the rules.ldsr.output function (labelled #1 in the Snakefile) to inform snakemake about rule execution order, as described here, but this is not possible here as the {chr} wildcard associated with the lsdr output is not included in the output of the partitioned_heritability rule, so the {chr} wildcard cannot be backfilled in the partitioned_heritability rule input . This means I get the error:

Wildcards in input files cannot be determined from output files: 'chr'

So I tried to use a flag file in the output of the ldsr rule and then call this in the input of the partitioned_heritability rule , see here for info on flag files, but this threw this error:

positional argument follows keyword argument

I thought this was probs because I was using named and unnamed inputs in the input/output calls, so also tried applying variable names to the ldsr outputs, i.e. touch_file = touch("mytask.done") and calling rules.ldsr.output.touch_file in the partitioned_heritability input but got the same error.

Also tried using check_call with the flag file, see #3, but that didn't work either. Tbh, I have never used the flag_file or the check_call options before so may not have understood, or be using, them properly.

Here are the files:

Snakefile:

import os
# read config info into this namespace
configfile: "config.yaml"

rule all:
    input:
        expand("LD_annotation_files/Bulk.{chr}.l2.ldscore.gz", chr=range(1,23)),
        expand("partHerit_files2/PrtHrt_Test.{GWASsumstat}", GWASsumstat = config['SUMSTATS'])

rule ldsr:
    input:
        annot = "LD_annotation_files/Bulk.{chr}.annot.gz",
        bfile_folder = "ldsc/reference_files/1000G_EUR_Phase3_plinkfiles",
        snps_folder = "ldsc/reference_files/hapmap3_snps"
    output:
        "LD_annotation_files/Bulk.{chr}.l2.ldscore.gz",
#2      touch("mytask.done") # Required to enforce rule execution order i.e. PH after lsdr?
    conda:
        "envs/environment.yml"
    params:
        bfile = "ldsc/reference_files/1000G_EUR_Phase3_plinkfiles/1000G.EUR.QC.{chr}",
        ldscores = "LD_annotation_files/ManuBulk.{chr}",
        snps = "ldsc/reference_files/w_hm3.snplist_rsIds"
    log:
        "logs/LDSC/Bulk.{chr}_ldsc.txt"
    shell:
        "export MKL_NUM_THREADS=2;" # Export arguments are  workaround as ldsr uses all available cores
        "export NUMEXPR_NUM_THREADS=2;" # Numbers must match cores parameter in cluster config
        "Reference/ldsc/ldsc.py --l2 --bfile {params.bfile} --ld-wind-cm 1 "
        "--annot {input.annot} --out {params.ldscores} --print-snps {params.snps} 2> {log}"

#3 check_call(["snakemake", "--snakefile", "mytask.done"])

rule partitioned_heritability:
    input:
        GWASsumstat = lambda wildcards: config['SUMSTATS'][wildcards.GWASsumstat],
#1        LDSR = rules.ldsr.output
#2        "mytask.done"
    output:
        "partHerit_files2/PrtHrt_Test.{GWASsumstat}"
    conda:
        "envs/environment.yml"
    params:
        weights = "ldsc/reference_files/weights_hm3_no_hla/weights.",
        baseline = "ldsc/reference_files/baselineLD_v2.2_1000G_Phase3/baselineLD.",
        frqfile = "ldsc/reference_files/1000G_Phase3_frq/1000G.EUR.QC.",
        LD_anns = "LD_annotation_files/Bulk."
    log:
        "logs/PrtHerit/Bulk.PrtHrt.{GWASsumstat}.txt"
    shell:
        "python ldsc/ldsc.py --h2 {input.GWASsumstat} --w-ld-chr {params.weights}"
        "--ref-ld-chr {params.baseline},{params.LD_anns} --overlap-annot"
        "--frqfile-chr {params.frqfile} --out {output} --print-coefficients"
        -coefficients"

Config File:

SUMSTATS:
    ADHD: GWAS_sumstats/munged_sumstats/adhd_LDSC.sumstats.gz
    SCZ: GWAS_sumstats/munged_sumstats/scz_LDSC.sumstats.gz
    MDD: GWAS_sumstats/munged_sumstats/mdd_LDSC.sumstats.gz
    ASD: GWAS_sumstats/munged_sumstats/asd_LDSC.sumstats.gz
    BIP: GWAS_sumstats/munged_sumstats/bip_LDSC.sumstats.gz

Any help/advice/solutions with this would be greatly appreciated.

snakemake python rule execution • 4.3k views
ADD COMMENT
4
Entering edit mode
4.8 years ago
Eric Lim ★ 2.2k

Use LDSR = expand(rules.ldsr.output, chr=range(1,23)) in your partitioned_heritability rule to resolve wildcard {chr} and to ensure the rule can only be started after ldsr processes all chromosomes.

ADD COMMENT
0
Entering edit mode

@EricLim - Many Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 1832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6