Hello hive mind,
I'm having issues moving snakemake from one sample to multiple samples. I have gotten this workflow to work for processing amplicon targets and it worked great.
rule all:
input:
"data/samples/LS19-3433-21_s1_de_novo"
rule seqtk_qualtiy_filter:
input:
"data/samples/LS19-3433-21_s1.IonXpress_035.2019-09-04T20_14_16Z.fastq"
output:
temp("data/samples/LS19-3433-21_s1.qtrim.fastq")
shell:
"seqtk trimfq -b 0.01 {input} > {output}"
rule seqtk_clip:
input:
"data/samples/LS19-3433-21_s1.qtrim.fastq"
output:
"data/samples/LS19-3433-21_s1.clean.fastq"
shell:
"seqtk trimfq -b20 -L 350 {input} > {output}"
rule bbnorm:
input:
"data/samples/LS19-3433-21_s1.clean.fastq"
output:
"data/samples/LS19-3433-21_s1.norm.fastq"
shell:
"bbnorm.sh -Xmx10g in={input} out={output} target=50"
rule spades:
input:
"data/samples/LS19-3433-21_s1.norm.fastq"
output:
"data/samples/LS19-3433-21_s1_de_novo"
shell:
"spades.py --iontorrent --only-assembler -s {input} -k 21,33,55,77,99,127 -o {output}"
So now I'm trying to generalize it and the documentation says I can you a config.yaml
to list my samples. Which I have done
samples:
LS19-3512-1: data/LS19-3512-1.IonXpress_060.2019-11-01T13_49_02Z.fastq
LS19-3512-2: data/LS19-3512-2.IonXpress_061.2019-11-01T13_49_02Z.fastq
LS19-3512-3: data/LS19-3512-3.IonXpress_062.2019-11-01T13_49_02Z.fastq
LS19-3512-5: data/LS19-3512-5.IonXpress_063.2019-11-01T13_49_02Z.fastq
LS19-3512-6: data/LS19-3512-6.IonXpress_064.2019-11-01T13_49_02Z.fastq
LS19-3512-8: data/LS19-3512-8.IonXpress_085.2019-11-01T13_49_02Z.fastq
LS19-3512-9: data/LS19-3512-9.IonXpress_086.2019-11-01T13_49_02Z.fastq
Now I have tried to load the samples in from the yaml file like the documentation shows. like so...
configfile: "config.yaml"
print("Starting amplicon analysis workflow")
rule seqtk_qualtiy_filter:
input:
expand("{sample}", sample=config["samples"])
output:
temp("data/{sample}.qtrim.fq")
shell:
"seqtk trimfq -b 0.01 {input} > {output}"
and get the following error
sean@LEN943:~/Desktop/cobb_reo_3512$ snakemake -s amplicon_snakefile -np
Starting amplicon analysis workflow
Building DAG of jobs...
WorkflowError:
Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards.
I have also tried the lambda wildcards: config["samples"][wildcards.sample]
and get the same error
I'm sure that this is an easy fix but I'm just not understanding how to do it as using the yaml/json config files will always require a wildcard/variable.
Thank you in advance for pointing me in the right direction.
Sean
For me, this question is fine here, but the developer of snakemake prefers stackoverflow.
Thank you for the feedback Wouter! I will give this a try when I escape the lab today