Hi, I am a complete newbie at snakemake. I had a similar post here how to get around snakemake with different wildcards input to output rule but I thought I would ask a simplified question.
I have previous rule ( not shown but works well) that generates x4 (bam) files. The next rule takes these four files as inputs and needs to produce one file as output.
I have tried the following:
configfile: "config.yaml"
fc = config["flowcell"]
index = config["index"]
samplename = config["sn"]
rule all:
input:
expand(["result/gatk4/{fc}_L01_{index}_piped.bam",
"result/gatk4/{fc}_L02_{index}_piped.bam",
"result/gatk4/{fc}_L03_{index}_piped.bam",
"result/gatk4/{fc}_L04_{index}_piped.bam", "result/gatk4/samplename_index{index}_markedduplicates.bam"], fc = fc, index = index)
rule MarkDuplicates:
input:
L01="result/gatk4/{fc}_L01_{index}_piped.bam",
L02="result/gatk4/{fc}_L02_{index}_piped.bam",
L03="result/gatk4/{fc}_L03_{index}_piped.bam",
L04="result/gatk4/{fc}_L04_{index}_piped.bam"
output:
bam="result/gatk4/samplename_index{index}_markedduplicates.bam",
txt="result/gatk4/samplename_index{index}_markedduplicates_metrics.txt"
shell:
"""
gatk --java-options '-Xmx30G' MarkDuplicates
I={input.L01} I={input.L02} I={input.L03} I= {input.L04}
O={output.bam}
M={output.txt}
TMP_DIR=`pwd`/tmp
2>{log}
"""
But I get the error:
Wildcards in input files cannot be determined from output files: 'fc'
I have tried looking at similar questions like here enter link description here But as a newbie I can't quite follow the code.
I was wondering if I could make a list and use this as the input to the rule like:
test = ["result/gatk4/V350134767_L01_3_piped.bam", "result/gatk4/V350134767_L02_3_piped.bam", "result/gatk4/V350134767_L03_3_piped.bam", "result/gatk4/V350134767_L04_3_piped.bam"]
rule MarkDuplicates:
input:
test
output:
bam="result/gatk4/samplename_index{index}_markedduplicates.bam",
txt="result/gatk4/samplename_index{index}_markedduplicates_metrics.txt"
shell:
"""
gatk --java-options '-Xmx30G' MarkDuplicates
I={test[0]} I={test[1]} I={test[2]} I={test[3]}
O={output.bam}
M={output.txt}
TMP_DIR=`pwd`/tmp
2>{log}
"""
Then I could work out how to generate this list without having to hard code the sample names but I don't know if it's possible (and don't know the correct syntax)
Please help.
I didn't want it in the output name.....is there a way to succeed without this?