I am trying to write a rule that has a different wildcard in the input to the output like so:
rule MarkDuplicates:
input:
L01="result/gatk4/{fc}_L01_{index}_piped.bam",
L02="result/gatk4/{fc}_L02_{index}_piped.bam",
L03="result/gatk4/{fc}_L03_{index}_piped.bam",
L04="result/gatk4/{fc}_L04_{index}_piped.bam"
output:
bam="result/gatk4/samplename_index{index}_markedduplicates.bam",
txt="result/gatk4/samplename_index{index}_markedduplicates_metrics.txt"
log:
"result/logs/markduplicates/samplename_index{index}.out"
benchmark:
"result/benchmarks/samplename_index{index}.md.out"
container:
config["containers"]["gatk4"]
threads: 8
shell:
"""
gatk --java-options '-Xmx30G' MarkDuplicates
I={params.L01} I={params.L02} I={params.L03} I={params.L04}
O={output.bam}
M={output.txt}
TMP_DIR=`pwd`/tmp
2>{log}
"""
But I know that you have to have matching wildcards in the input rule to the output rule. I tried adding a touch to the previous rule and adding the real input files to the params: like so:
configfile: "config.yaml"
fc = config["flowcell"]
index = config["index"]
samplename = config["sn"]
lane = ["L01","L02","L03","L04"]
con_ref = config["ref"]
rule all:
input:
expand(["result/gatk4/{fc}_L01_{index}_piped.bam",
"result/gatk4/{fc}_L02_{index}_piped.bam",
"result/gatk4/{fc}_L03_{index}_piped.bam",
"result/gatk4/{fc}_L04_{index}_piped.bam", "result/gatk4/SamToFastq.done", "result/gatk4/samplename_index{index}_markedduplicates.bam"], fc = fc, lane = lane, index = index)
input:
"result/gatk4/{fc}_{lane}_{index}_markadapters.bam",
output:
unmapped="result/gatk4/{fc}_{lane}_{index}_fastqtosam.bam",
out="result/gatk4/{fc}_{lane}_{index}_piped.bam",
tmp="result/gatk4/SamToFastq.done"
log:
"result/logs/samtofastq/{fc}_{lane}_{index}.out"
benchmark:
"result/benchmarks/{fc}_{lane}_{index}.sam2fq.out"
params:
ref=con_ref
container:
config["containers"]["gatk4"]
threads: 8
shell:
"""
gatk --java-options '-Xmx8G' SamToFastq
I={input}
FASTQ=/dev/stdout
CLIPPING_ATTRIBUTE=XT CLIPPING_ACTION=2 INTERLEAVE=true
bwa mem -M -t {threads} -p {params.ref} /dev/stdin |
gatk --java-options '-Xmx20G' MergeBamAlignment
ALIGNED_BAM=/dev/stdin
UNMAPPED_BAM={output.unmapped}
OUTPUT={output.out}
REF={params.ref} CREATE_INDEX=true ADD_MATE_CIGAR=true
CLIP_ADAPTERS=false CLIP_OVERLAPPING_READS=true
INCLUDE_SECONDARY_ALIGNMENTS=true MAX_INSERTIONS_OR_DELETIONS=-1
PRIMARY_ALIGNMENT_STRATEGY=MostDistant ATTRIBUTES_TO_RETAIN=XS
TMP_DIR=`pwd`/tmp
2>{log} &&
touch {output.tmp}
"""
rule MarkDuplicates:
input:
"result/gatk4/SamToFastq.done"
output:
bam="result/gatk4/samplename_index{index}_markedduplicates.bam",
txt="result/gatk4/samplename_index{index}_markedduplicates_metrics.txt"
log:
"result/logs/markduplicates/samplename_index{index}.out"
benchmark:
"result/benchmarks/samplename_index{index}.md.out"
container:
config["containers"]["gatk4"]
params:
L01="result/gatk4/{fc}_L01_{index}_piped.bam",
L02="result/gatk4/{fc}_L02_{index}_piped.bam",
L03="result/gatk4/{fc}_L03_{index}_piped.bam",
L04="result/gatk4/{fc}_L04_{index}_piped.bam"
threads: 8
shell:
"""
gatk --java-options '-Xmx30G' MarkDuplicates
I={params.L01} I={params.L02} I={params.L03} I={params.L04}
O={output.bam}
M={output.txt}
TMP_DIR=`pwd`/tmp
2>{log}
"""
This generates an error:
SyntaxError:
Not all output, log and benchmark files of rule SamToFastq contain the same wildcards. This is crucial though, in order to avoid that two or more jobs write to the same file.
I then tried creating the touch file another way:
configfile: "config.yaml"
fc = config["flowcell"]
index = config["index"]
samplename = config["sn"]
lane = ["L01","L02","L03","L04"]
con_ref = config["ref"]
rule all:
input:
expand(["result/gatk4/{fc}_L01_{index}_piped.bam",
"result/gatk4/{fc}_L02_{index}_piped.bam",
"result/gatk4/{fc}_L03_{index}_piped.bam",
"result/gatk4/{fc}_L04_{index}_piped.bam", "result/gatk4/SamToFastq.done", "result/gatk4/samplename_index{index}_markedduplicates.bam"], fc = fc, lane = lane, index = index)
input:
"result/gatk4/{fc}_{lane}_{index}_markadapters.bam",
output:
unmapped="result/gatk4/{fc}_{lane}_{index}_fastqtosam.bam",
out="result/gatk4/{fc}_{lane}_{index}_piped.bam",
touch("result/gatk4/SamToFastq.done")
log:
"result/logs/samtofastq/{fc}_{lane}_{index}.out"
benchmark:
"result/benchmarks/{fc}_{lane}_{index}.sam2fq.out"
params:
ref=con_ref
container:
config["containers"]["gatk4"]
threads: 8
shell:
"""
gatk --java-options '-Xmx8G' SamToFastq
I={input}
FASTQ=/dev/stdout
CLIPPING_ATTRIBUTE=XT CLIPPING_ACTION=2 INTERLEAVE=true
bwa mem -M -t {threads} -p {params.ref} /dev/stdin |
gatk --java-options '-Xmx20G' MergeBamAlignment
ALIGNED_BAM=/dev/stdin
UNMAPPED_BAM={output.unmapped}
OUTPUT={output.out}
REF={params.ref} CREATE_INDEX=true ADD_MATE_CIGAR=true
CLIP_ADAPTERS=false CLIP_OVERLAPPING_READS=true
INCLUDE_SECONDARY_ALIGNMENTS=true MAX_INSERTIONS_OR_DELETIONS=-1
PRIMARY_ALIGNMENT_STRATEGY=MostDistant ATTRIBUTES_TO_RETAIN=XS
TMP_DIR=`pwd`/tmp
2>{log}
"""
rule MarkDuplicates:
input:
"result/gatk4/SamToFastq.done"
output:
bam="result/gatk4/samplename_index{index}_markedduplicates.bam",
txt="result/gatk4/samplename_index{index}_markedduplicates_metrics.txt"
log:
"result/logs/markduplicates/samplename_index{index}.out"
benchmark:
"result/benchmarks/samplename_index{index}.md.out"
container:
config["containers"]["gatk4"]
params:
L01="result/gatk4/{fc}_L01_{index}_piped.bam",
L02="result/gatk4/{fc}_L02_{index}_piped.bam",
L03="result/gatk4/{fc}_L03_{index}_piped.bam",
L04="result/gatk4/{fc}_L04_{index}_piped.bam"
threads: 8
shell:
"""
gatk --java-options '-Xmx30G' MarkDuplicates
I={params.L01} I={params.L02} I={params.L03} I={params.L04}
O={output.bam}
M={output.txt}
TMP_DIR=`pwd`/tmp
2>{log}
"""
But I get the error:
positional argument follows keyword argument
Is there a way of getting around different wildcards for the input to output rules?? If so how would I achieve this??
Hello @sheryl, sorry my bad.
Humm You rule MarkDuplicates looks fine.
If the config file is not to big can you share it ? I think the problem is with the wildcard notation again.
Also can you share the name of the input files and desired output structure ? I think i know what is happening but i want to be sure.
My gess is that the problem seems to be that the MarkDuplicates rule can't figure it out the name of the outputs bc maybe you have two "{fc}" with the same "{index}".
What you can do is to add the {fc} to the name of the output (if possible):
I tried here with snakemake 7.8.5 and it worked. But maybe i got your file structure wrong, please do clarify if so.
The resulting DAG looks like this:
Thanks for your answer hugo.avila
As the original question evolved slightly (as I didn't want to use the dummy file
result/gatk4/SamToFastq.done
in the input rule but thought I had to to get around the problematic{fc}
), I posted an updated question here linked question In the end I gave up trying to not include {fc} into the output (re. the linked question as I couldn't quite follow what was going on and I kept getting errors) and this answer works well.Thanks for you help - much appreciated and lots of learning had