GenomicsDBImport with sample map in snakemake
1
0
Entering edit mode
8 weeks ago
hpapoli ▴ 150

Hello,

I have two questions about writing a snakemake rule for GenomicsDBImport for GATK:

Question 1: When using GenomicsDBImport with sample map, it is not clear to me what is the input since the actual inputs are the GVCF files indicated in the sample map but one does not specify them literally in the command line. Please see after the questions for the sample code.

Question 2: The output of the GenomicsDBImport is a directory. How should I indicate a directory as output in the rule all command?

rule genomicsdbimport:
    input:
        sample_map="config/sample_map/interval_{num}.sample_map"
    output:
        db=directory("results/genomicsdb/interval_{num}_gdb")
    resources:
        threads=10,
        mem_mb=64000,
        runtime="48h",
        tmpdir="$SNIC_TMP"
    params:
        mem= "-Xmx64g",
        interval="config/intervals/interval_{num}.bed"
    benchmark:
        "benchmark/gatk_haplotypecaller/{sample}.{num}.genomicsdbimport.txt"
    singularity:
        "containers/gatk_4.6.0.0.sif"
    shell:
        """
        gatk --java-options {params.mem} GenomicsDBImport \
        --genomicsdb-workspace-path {output} \
        --intervals {params.interval} \
        --batch-size 50 \
        --sample-name-map {input.sample_map} \
        --tmp-dir {resources.tmpdir} \
        --reader-threads {resources.threads}
        """

So maybe a rule all:

rule all:
    expand(directory("results/genomicsdb/interval_{num}_gdb"), num=num=[i for i in range(1, 20)]) 

But this does not seem right. Thanks so much for your help!

snakemake • 234 views
ADD COMMENT
0
Entering edit mode
8 weeks ago

The output of the GenomicsDBImport is a directory. How should I indicate a directory as output in the rule all command?

directory("DIR")

https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#directories-as-outputs

When using GenomicsDBImport with sample map, it is not clear to me what is the input since the actual inputs are the GVCF files indicated in the sample map but one does not specify them literally in the command line

Path to file containing a mapping of sample name to file uri in tab delimited format. I

   $ cat sample.map
    sample1 /path/to/sample1.g.vcf.gz
    sample2 /path/to/sample2.g.vcf.gz
    sample3 /path/to/sample3.g.vcf.gz
ADD COMMENT
0
Entering edit mode

I see, thanks! This is how I've defined it in the rule itself. I'll see if it works in the rule all as well.

Yes, I see that sample map contains the path to gvcf files but the sample map itself, is it the input of the rule?

Maybe I'm overthinking this, I'll just run it see what I'll get.

Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2412 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6