Question

Snakemake: how to define output from a dictionary according to wildcard

1

Entering edit mode

9 months ago

ema ▴ 10

Hello,

Is it possible to define a snakemake rule's output based on a dictionary from a wildcard? I have a dictionary, let's call it 'transform', that contains key:value so that {sample}:new name. I want to write a snakemake rule like this example:

rule example:
    input:
        i = "path/{sample}.vcf"
    output:
        o = "path/transform[{sample}].vcf"
    shell:
        """
        somecommand -i {input.i} -o {output.o}
        """

I've been looking around the internet for similar questions that have been resolved, although their solutions seem too specific. I'm unsure on what would work best. Reading through snakemake's documentation has further confused me on what is accepted for output and params.

Right now I'm looking at it like this:

input: i = "path/{sample}.vcf"
output: o = "path/{params.value}.vcf"
params: value = lambda wcs: transform[wcs.sample]

Does anyone know how I could make this work?

Thank you.

wildcards snakemake dictionary • 878 views

ADD COMMENT • link 8 months ago by ema ▴ 10

1

Entering edit mode

The way snakemake works is that you write implicit wildcard rules and then ask for explicit outputs.

So you can definitely map arbitrary inputs and outputs using a lambda function and a dictionary but in the end you will still need to explicitly ask for the outputs.

Your transform function must take an output and generate the correct input, not the other way around.

ADD REPLY • link 9 months ago by Jeremy Leipzig 22k

0

Entering edit mode

Thank you for your answer! You're right, I was thinking about this the wrong way, had input and output mistaken.

ADD REPLY • link 8 months ago by ema ▴ 10

score 2 · Accepted Answer · 2024-03-02

2

Entering edit mode

9 months ago

dariober 15k

I would invert the dictionary to have key: newname, value: sample_id. If you have multiple IDs mapping to the same newname, then you would have the same output generated by multiple input files which is something you probably don't want to happen. Then use newname to capture the corresponding input {sample_id}, something like:

transform = {"new_name1": "sample1", "new_name2": "sample2"}

rule example:
    input:
        i = lambda wc: "path/%s.vcf" % transform[wc.newname],
    output:
        o = "path/{newname}.vcf",

Downstream rules will then depend on wildcard {newname}.

ADD COMMENT • link 9 months ago by dariober 15k

0

Entering edit mode

Thank you for this answer. The code works!

ADD REPLY • link 8 months ago by ema ▴ 10