Snakemake flow variantion
2
2
Entering edit mode
4.7 years ago
effidotpy ▴ 20

Hi, I have been able to create my own linear workflows (A -> B -> C -> D) with Snakemake. However, now I would like to include some optional steps (X) that should be only executed if the user specifies it. Briefly, most of the times C will take as input the output from B, but sometimes I would need X to take as input the output from B, and then C to take the output from X.

enter image description here

After looking at the documentation I have not been able to figure out how to do this. I don't even know if this is feasible, or if there is other approach that fits better. I would appreciate some guidance here.

Thanks!

snakemake python automatization workflow • 3.5k views
ADD COMMENT
5
Entering edit mode
4.7 years ago
russhh 5.7k

The input to C can be a function that defines files based on the wildcard for the output from C. You could write a function that decides what the input to C should therefore be.

When you say that "sometimes I would need X to take as input the output from B and then C to take the output from X", do you mean that the optional use of X is decided for a given sample within your workflow (sample 1 might pass through X, but sample 2 might not need to), or that the whole workflow should optionally use X based on some config/argument (for a given experiment, choose to run all samples through X)

Something like this

# optionally use X for every sample passing through the workflow
def input_for_c(wildcards):
    # requires a config containing switches for the whole workflow
    if config["Use X"]:
        return "./data/X/{}".format(wildcards["sample_id"])
    else:
        return "./data/B/{}".format(wildcards["sample_id"])

# optionally use X for the current sample
def input_for_c(wildcards):
    # requires a `sample_config` containing switches for each separate sample
    sample_id = wildcards["sample_id"]
    if sample_config[sample_id]["Use X"]:
        return "./data/X/{}".format(wildcards.sample_id)
    else:
        return "./data/B/{}".format(wildcards.sample_id)

rule all:
    input: expand("./data/C/{sample_id}", sample_id = SAMPLES)

rule B:
    output: "./data/B/{sample_id}"
    ...

rule X:
    output: "./data/X/{sample_id}"
    ...

rule C:
    input: input_for_c
    output: "./data/C/{sample_id}"
    ...
ADD COMMENT
0
Entering edit mode

This looks a bit complicated, on top of my head I think you can also do something like:

rule C:
    input: 
        lambda wildcards: "./data/X/"+wildcards.sample_id if config["Use X"] else "./data/B/"+wildcards.sample_id
    output: 
        "./data/C/{sample_id}"
    ...
ADD REPLY
0
Entering edit mode

You'd only include one of the functions in your actual Snakefile. Better to have a function than to stuff the same logic into a lambda IMO

ADD REPLY
0
Entering edit mode

Your help did the trick. Thanks mates!

Regarding your question russhh, I want to use this "switch" for the whole experiment, i.e to process all its samples equally. The point is that I also want to use this workflow for another experiments that might require some extra intermediary steps.

ADD REPLY
1
Entering edit mode
4.7 years ago
gb ★ 2.2k

snakemake is basically python code. So you could wrap the rule in a if else like:

if sample_config["UseX"]:
    rule x:

Or maybe this can help: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#data-dependent-conditional-execution

ADD COMMENT
0
Entering edit mode

Are you sure you can do that (optional rule definition) on a sample-by-sample basis?

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 1946 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6