Entering edit mode
4.0 years ago
User000
▴
710
Dear all,
I'm doing variant calling with GATK. My reference genome is a fasta file with 200K scaffolds and I prepared a list called interval.list:
scaffold1
scaffold18
scaffold23
I want to use this interval list as (-L)
wildcards, to accelerate/parallelise the process. How to do this?
INTERVAL= "interval.list"
rule all:
input:
expand("{int}.vcf", int=INTERVAL)
rule gatk:
input:
ref_done="ref.fasta",
bam="bam.list"
output:
outf ="{int}.vcf"
shell:
"""
/Tools/gatk/gatk --java-options "-Xmx16g -XX:ParallelGCThreads=10" HaplotypeCaller -L {wildcards.int} -R {input.ref_done} -I {input.bam} -O {output.outf}
"""
rule merge:
input:
int="{int}.vcf"
output:
outf = "results/res.vcf"
shell:
"java -jar /Tools/picard/build/libs/picard.jar GatherVcfs {wildcards.int} O={output.outf}"
you are almost there. actually read in the intervals file into a python list (what you call
INTERVAL
), set a target rule that uses that list as input (you did this already). Your wildcard rule is set up properly. The merge rule will take some modification because it can't be a wildcard loop but the input is the whole list of outputs.So I do not know how to read the interval.list file..... as a normal python list?
yes open the file and append each line to the list in normal python
I added what I did so far, does it look good? seems to be working
no, this merges only scaffold1.... :/