Hello,
I have two questions about writing a snakemake rule for GenomicsDBImport for GATK:
Question 1: When using GenomicsDBImport with sample map, it is not clear to me what is the input since the actual inputs are the GVCF files indicated in the sample map but one does not specify them literally in the command line. Please see after the questions for the sample code.
Question 2: The output of the GenomicsDBImport is a directory. How should I indicate a directory as output in the rule all command?
rule genomicsdbimport:
input:
sample_map="config/sample_map/interval_{num}.sample_map"
output:
db=directory("results/genomicsdb/interval_{num}_gdb")
resources:
threads=10,
mem_mb=64000,
runtime="48h",
tmpdir="$SNIC_TMP"
params:
mem= "-Xmx64g",
interval="config/intervals/interval_{num}.bed"
benchmark:
"benchmark/gatk_haplotypecaller/{sample}.{num}.genomicsdbimport.txt"
singularity:
"containers/gatk_4.6.0.0.sif"
shell:
"""
gatk --java-options {params.mem} GenomicsDBImport \
--genomicsdb-workspace-path {output} \
--intervals {params.interval} \
--batch-size 50 \
--sample-name-map {input.sample_map} \
--tmp-dir {resources.tmpdir} \
--reader-threads {resources.threads}
"""
So maybe a rule all:
rule all:
expand(directory("results/genomicsdb/interval_{num}_gdb"), num=num=[i for i in range(1, 20)])
But this does not seem right. Thanks so much for your help!
I see, thanks! This is how I've defined it in the rule itself. I'll see if it works in the rule all as well.
Yes, I see that sample map contains the path to gvcf files but the sample map itself, is it the input of the rule?
Maybe I'm overthinking this, I'll just run it see what I'll get.
Thanks!