Entering edit mode
11 months ago
Pablo
•
0
I have a checkpoints that leads to an unknown number of files, needing DAG reevaluation after execution. I wanted to speed it up by parallelizing the command by chromosome. Thus, after reevaluating, I would merge all chromosomes by experiment. However, snakemake is unable to infere the chromosomes.
checkpoint GenomeAnalysisTK:
input:
bamlist = rules.RealignerTargetCreator.output.bamlist,
intervals = rules.RealignerTargetCreator.output.intervals,
fasta = fasta
output:
temp(directory("splits/{chromosome}"))
conda:
"gatk3"
wildcard_constraints:
chromosome='|'.join([x for x in detect_chromosomes(fai)]),
shell:
"""
mkdir -p {output} && cd {output}
gatk3 -Xmx24g -T IndelRealigner -I {input.bamlist} -targetIntervals {input.intervals} -L {wildcards.chromosome} -R {input.fasta} -compress 0 --nWayOut .{wildcards.chromosome}.indelrealigned.bam
"""
def agg(wildcards):
checkpoints.GenomeAnalysisTK.get(**wildcards).output[0]
return expand("splits/{chromosome}/{{experiment}}.merged.{chromosome}.indelrealigned.bam", chromosome=get_chromosomes(fai))
rule merge_realigned:
input:
agg
output:
"{patient}/{sample}/{experiment}.merged.indelrealigned.bam"
threads:
config["other_threads"],
params:
compression_level = 0
wildcard_constraints:
chromosome='|'.join([x for x in detect_chromosomes(fai)]),
shell:
"samtools merge -@ {threads} -l {params.compression_level} {output} {input}"
However, I get the typical "WorkflowError: Missing wildcard values for chromosome". How can I make it infere the chromosomes? I think the main issue is that I scatter by chromosomes, but I don't use that wildcard anywhere else.
There are some references in those rules (
detect_chromosomes
,get_chromosomes
,rules.RealignerTargetCreator
) that aren't shown here, so it's hard to know how to help. If you make a minimal self-contained example (hardcode some of those values maybe) I expect you'd be more likely to get some answers.