Is it possible to run haplotypecaller in gnu parallel ?
3
I have a bash script about HaplotypeCaller.sh. Is it possible to run this in gnu parallel ?
cat HaplotypeCaller.sh | parallel
#!/bin/bash
for i in *.bam; do
gatk --java-options "-Xmx40g" HaplotypeCaller -R /media/gatk/Homo_sapiens_assembly38.fasta -I $i -O $i .vcf.gz --dbsnp /media/gatk/dbsnp_138.hg38.vcf.gz -L /media/gatk/target/ag_V6_list.interval_list; done
gnu parallel
HaplotypeCaller
parallel
gatk
• 4.0k views
#!/bin/bash
do_one( ) {
gatk --java-options "-Xmx40g" HaplotypeCaller -R /media/gatk/Homo_sapiens_assembly38.fasta -I "$1 " -O "$1 " .vcf.gz --dbsnp /media/gatk/dbsnp_138.hg38.vcf.gz -L /media/gatk/target/ag_V6_list.interval_list
}
export -f do_one
parallel do_one ::: *.bam
If you do not want the output to be named foo.bam.vcf.gz
but instead foo.vcf.gz
:
#!/bin/bash
do_one( ) {
gatk --java-options "-Xmx40g" HaplotypeCaller -R /media/gatk/Homo_sapiens_assembly38.fasta -I "$1 " -O "$2 " --dbsnp /media/gatk/dbsnp_138.hg38.vcf.gz -L /media/gatk/target/ag_V6_list.interval_list
}
export -f do_one
parallel do_one { } { .} ::: *.bam
nextflow (not tested)
params.ref= "/media/gatk/Homo_sapiens_assembly38.fasta"
params.bams= ""
params.dbsnp= "/media/gatk/dbsnp_138.hg38.vcf.gz"
params.intervals= "/media/gatk/target/ag_V6_list.interval_list"
Channel.fromPath( params.bams) .splitCsv( header: false,sep:',' ,strip:true) .map{ T-> file( T[ 0] ) } .set{ bamfiles}
process hapCaller {
tag "${bam.name} "
memory "40g"
input:
file bam from bamfiles
output:
file( "${bam.getBaseName()} .vcf.gz" ) into vcf
script:
"" "
gatk --java-options " -Xmx${task.memory.giga} g -Djava.io.tmpdir= ." HaplotypeCaller \
-R ${params.ref} \
-I {bam.toRealPath()} \
-O " ${bam.getBaseName()} .vcf.gz" \
--dbsnp " ${params.dbsnp} " \
-L " ${params.intervals} "
" ""
}
The answer given by Malcolm.Cook , ole.tange and cpad0112 works well
Method 1:
parallel gatk --java-options "-Xmx40g" HaplotypeCaller \
-R /media/gatk/Homo_sapiens_assembly38.fasta -I { } -O { .} .vcf.gz \
--dbsnp /media/gatk/dbsnp_138.hg38.vcf.gz \
-L /media/gatk/target/ag_V6_list.interval_list ::: *.bam
Method 2:
#!/bin/bash
do_one( ) {
gatk --java-options "-Xmx40g" HaplotypeCaller -R /media/gatk/Homo_sapiens_assembly38.fasta -I "$1 " -O "$1 " .vcf.gz --dbsnp /media/gatk/dbsnp_138.hg38.vcf.gz -L /media/gatk/target/ag_V6_list.interval_list
}
export -f do_one
parallel do_one ::: *.bam
Login before adding your answer.
Traffic: 1162 users visited in the last hour
Something like this should work
I get following error while running above command
@ 4galaxy77
rev | cut -c4-
removes the .bam extension. But{.}
in parallel removes the extension.see if following prints exact commands you want to run and then remove
--dry-run
optionmake sure that system has enough resources to execute parallel commands. Other wise, limit memory / threads/ cpus.
use a workflow manager.