Is it possible to run haplotypecaller in gnu parallel ?
3
I have a bash script about HaplotypeCaller.sh. Is it possible to run this in gnu parallel ?
cat HaplotypeCaller.sh | parallel
#!/bin/bash
for i in *.bam; do
gatk --java-options "-Xmx40g" HaplotypeCaller -R /media/gatk/Homo_sapiens_assembly38.fasta -I $i -O $i.vcf.gz --dbsnp /media/gatk/dbsnp_138.hg38.vcf.gz -L /media/gatk/target/ag_V6_list.interval_list; done
gnu parallel
HaplotypeCaller
parallel
gatk
• 3.6k views
#!/bin/bash
do_one() {
gatk --java-options "-Xmx40g" HaplotypeCaller -R /media/gatk/Homo_sapiens_assembly38.fasta -I "$1" -O "$1".vcf.gz --dbsnp /media/gatk/dbsnp_138.hg38.vcf.gz -L /media/gatk/target/ag_V6_list.interval_list
}
export -f do_one
# test that do_one works on a single bam file. Then:
parallel do_one ::: *.bam
If you do not want the output to be named foo.bam.vcf.gz
but instead foo.vcf.gz
:
#!/bin/bash
do_one() {
gatk --java-options "-Xmx40g" HaplotypeCaller -R /media/gatk/Homo_sapiens_assembly38.fasta -I "$1" -O "$2" --dbsnp /media/gatk/dbsnp_138.hg38.vcf.gz -L /media/gatk/target/ag_V6_list.interval_list
}
export -f do_one
# test that do_one works on a single bam file:
# do_one foo.bam foo.vcf.gz
# Then:
parallel do_one {} {.} ::: *.bam
nextflow (not tested)
params.ref="/media/gatk/Homo_sapiens_assembly38.fasta"
params.bams=""
params.dbsnp="/media/gatk/dbsnp_138.hg38.vcf.gz"
params.intervals="/media/gatk/target/ag_V6_list.interval_list"
Channel.fromPath(params.bams).splitCsv(header: false,sep:',',strip:true).map{T->file(T[0])}.set{bamfiles}
process hapCaller {
tag "${bam.name}"
memory "40g"
input:
file bam from bamfiles
output:
file("${bam.getBaseName()}.vcf.gz") into vcf
script:
"""
gatk --java-options " -Xmx${task.memory.giga}g -Djava.io.tmpdir=." HaplotypeCaller \
-R ${params.ref} \
-I {bam.toRealPath()} \
-O "${bam.getBaseName()}.vcf.gz" \
--dbsnp "${params.dbsnp}" \
-L "${params.intervals}"
"""
}
The answer given by Malcolm.Cook , ole.tange and cpad0112 works well
Method 1:
parallel gatk --java-options "-Xmx40g" HaplotypeCaller \
-R /media/gatk/Homo_sapiens_assembly38.fasta -I {} -O {.}.vcf.gz \
--dbsnp /media/gatk/dbsnp_138.hg38.vcf.gz \
-L /media/gatk/target/ag_V6_list.interval_list ::: *.bam
Method 2:
#!/bin/bash
do_one() {
gatk --java-options "-Xmx40g" HaplotypeCaller -R /media/gatk/Homo_sapiens_assembly38.fasta -I "$1" -O "$1".vcf.gz --dbsnp /media/gatk/dbsnp_138.hg38.vcf.gz -L /media/gatk/target/ag_V6_list.interval_list
}
export -f do_one
# test that do_one works on a single bam file. Then:
parallel do_one ::: *.bam
Login before adding your answer.
Traffic: 2160 users visited in the last hour
Something like this should work
I get following error while running above command
@ 4galaxy77
rev | cut -c4-
removes the .bam extension. But{.}
in parallel removes the extension.see if following prints exact commands you want to run and then remove
--dry-run
optionmake sure that system has enough resources to execute parallel commands. Other wise, limit memory / threads/ cpus.
use a workflow manager.