Hello all,
I'm testing this simple snakemake
pipeline.
import pandas as pd
configfile: "config.yaml"
units = pd.read_csv(
config["units"], dtype=str, sep="\t").set_index(["sample", "unit"], drop=False)
units.index = units.index.set_levels(
[i.astype(str) for i in units.index.levels])
def get_fastqs(wildcards):
"""Get raw FASTQ files from unit sheet."""
return units.loc[
(wildcards.sample, wildcards.unit), ["fq1", "fq2"]].dropna()
rule all:
input:
expand('test/{unit.sample}-paired_reads', unit=units.itertuples())
rule getfastas:
input:
get_fastqs
output:
"test/{unit.sample}-paired_reads.txt"
shell:
"echo {input} > {output}"
and it fails with the following error:
MissingInputException in line 16 of path/test.smk:
Missing input files for rule all:
test/SRR3396382-paired_reads
test/SRR3396381-paired_reads
The pandas dataframe units
looks like this:
And in the rule all, unit=units.itertuples()
returns python named tuples where I get the value(string) from the column sample
, so the SRR339638x
in the missing files are coming from there.
In which way can I use get_fastqs
to produce the missing files? I thing this is related to my misunderstanding of some, maybe basic, snakemake
functionality.
In the function that you are passing
wildcards
to the name of the variable isunit.sample
notunit
. Sowildcards.sample, wildcards.unit
does not exist.Hi Medhat, thanks for your time.
I don't understand what are you referring to. Can you elaborate a bit more?
As is understand it,
unit.sample
is a named tuple (valid python), so the variable is namedunit
, and has the attribute namedsample
, that's why in the rule all, the wildcard{unit.sample}
produces theSRR339638x
strings. To explain it better, it's a wildcard that has two 'named' attributes, "unit" and "sample".I edited my original post, I'm posting a simpler example but is exactly the same scenario.
My bad, It was not clear previously.