Interval list as a wildcard in snakemake
1
0
Entering edit mode
4.0 years ago
User000 ▴ 710

Dear all,

I'm doing variant calling with GATK. My reference genome is a fasta file with 200K scaffolds and I prepared a list called interval.list:

scaffold1
scaffold18
scaffold23

I want to use this interval list as (-L) wildcards, to accelerate/parallelise the process. How to do this?

 INTERVAL= "interval.list"
     rule all:
         input:
             expand("{int}.vcf", int=INTERVAL)
     rule gatk:
         input:
             ref_done="ref.fasta",
            bam="bam.list"
        output:
            outf ="{int}.vcf"
        shell:
             """
             /Tools/gatk/gatk --java-options "-Xmx16g -XX:ParallelGCThreads=10" HaplotypeCaller -L {wildcards.int} -R {input.ref_done} -I {input.bam} -O {output.outf}
             """
     rule merge:
         input:
             int="{int}.vcf"
         output:
             outf = "results/res.vcf"
         shell:
             "java -jar /Tools/picard/build/libs/picard.jar GatherVcfs {wildcards.int} O={output.outf}"
snakemake wildcards • 2.0k views
ADD COMMENT
1
Entering edit mode

you are almost there. actually read in the intervals file into a python list (what you call INTERVAL), set a target rule that uses that list as input (you did this already). Your wildcard rule is set up properly. The merge rule will take some modification because it can't be a wildcard loop but the input is the whole list of outputs.

ADD REPLY
0
Entering edit mode

So I do not know how to read the interval.list file..... as a normal python list?

ADD REPLY
1
Entering edit mode

yes open the file and append each line to the list in normal python

ADD REPLY
0
Entering edit mode

I added what I did so far, does it look good? seems to be working

ADD REPLY
0
Entering edit mode

no, this merges only scaffold1.... :/

ADD REPLY
0
Entering edit mode
4.0 years ago
User000 ▴ 710

Something like this? Please if you have suggestions how to improve the code let me know since I am learning snakemake it could be very important for me

f = open('Interval2.list')
my_interval = f.read().split("\n")
f.close()

rule gatk:
    input:
        gatk_ref_done="ref.fasta",
        bam="bam.list",
    output:
        outf ="scaffolds/{int}.vcf"
    shell:
        """
        /Tools/gatk/gatk --java-options "-Xmx16g -XX:ParallelGCThreads=10" HaplotypeCaller -L {wildcards.int} -R {input.gatk_ref_done} -I {input.bam} -O {output.outf}
        """

rule merge:
    input:
        lambda w: expand("scaffolds/{int}.vcf", int=my_interval)
    output:
        outf = "rh.vcf"
    params:
       lambda w: "I=" + " I=".join(expand("scaffolds/{int}.vcf", int=my_interval))
    shell:
        "java -jar /Tools/picard/build/libs/picard.jar GatherVcfs {params} O={output.outf}"
ADD COMMENT
0
Entering edit mode

this should work without your lambda w:. Just get rid of those and type snakemake merge

ADD REPLY

Login before adding your answer.

Traffic: 1686 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6