Question

MissingInputException in snakemake workflow

0

Entering edit mode

6.0 years ago

jmat • 0

Hello all,
I'm testing this simple snakemake pipeline.

import pandas as pd
configfile: "config.yaml"

units = pd.read_csv(
    config["units"], dtype=str, sep="\t").set_index(["sample", "unit"], drop=False)

units.index = units.index.set_levels(
    [i.astype(str) for i in units.index.levels])

def get_fastqs(wildcards):
    """Get raw FASTQ files from unit sheet."""

    return units.loc[
        (wildcards.sample, wildcards.unit), ["fq1", "fq2"]].dropna()

rule all:
    input:
        expand('test/{unit.sample}-paired_reads', unit=units.itertuples())

rule getfastas:
    input:
        get_fastqs
    output:
        "test/{unit.sample}-paired_reads.txt"
    shell:
        "echo {input} > {output}"

and it fails with the following error:

MissingInputException in line 16 of path/test.smk:
Missing input files for rule all:
test/SRR3396382-paired_reads
test/SRR3396381-paired_reads

The pandas dataframe units looks like this:

And in the rule all, unit=units.itertuples() returns python named tuples where I get the value(string) from the column sample, so the SRR339638x in the missing files are coming from there.

In which way can I use get_fastqs to produce the missing files? I thing this is related to my misunderstanding of some, maybe basic, snakemake functionality.

snakemake • 4.5k views

ADD COMMENT • link updated 6.0 years ago by Medhat 9.8k • written 6.0 years ago by jmat • 0

0

Entering edit mode

In the function that you are passing wildcards to the name of the variable is unit.sample not unit. So wildcards.sample, wildcards.unit does not exist.

ADD REPLY • link 6.0 years ago by Medhat 9.8k

0

Entering edit mode

Hi Medhat, thanks for your time.

I don't understand what are you referring to. Can you elaborate a bit more?
As is understand it, unit.sample is a named tuple (valid python), so the variable is named unit, and has the attribute named sample, that's why in the rule all, the wildcard {unit.sample} produces the SRR339638x strings. To explain it better, it's a wildcard that has two 'named' attributes, "unit" and "sample".

I edited my original post, I'm posting a simpler example but is exactly the same scenario.

ADD REPLY • link 6.0 years ago by jmat • 0

0

Entering edit mode

My bad, It was not clear previously.

ADD REPLY • link 6.0 years ago by Medhat 9.8k

score 0 · Answer 1 · 2019-07-26

0

Entering edit mode

6.0 years ago

Medhat 9.8k

Please change 'test/{unit.sample}-paired_reads', unit=units.itertuples().

The variable for expand is named unit.sample but you are calling it unit It should be : 'test/{unit.sample}-paired_reads', unit.sample=units.itertuples().

Also, the wildcard have a variable called unit.sample not just sample so to use it in the function it is now called wildcard.unit.sample.sample

ADD COMMENT • link 6.0 years ago by Medhat 9.8k

0

Entering edit mode

when i do this:

expand('test/{unit.sample}-paired_reads', unit.sample=units.itertuples())

I got this error:

SyntaxError in line 18 of path/test.smk:
keyword can't be an expression

the expansion in the rule all it's fine, is working as expected as the snakemake error indicates, it's producing the expected files (strings). What happens is that itertuples() returns a named tuple and its attribute is sample, so that's why {unit.sample} works. the .sample part of unit.sample it's its attribute which is called sample. My problem is passing the proper wildcard to my input function in the rule getfastas

ADD REPLY • link 6.0 years ago by jmat • 0