Is it possible to run variant calling software in parallel for a 1000 of bacterial BAMs, are there any Shell and Python/R code examples available?
Is it possible to run variant calling software in parallel for a 1000 of bacterial BAMs, are there any Shell and Python/R code examples available?
One VCF per BAM ? create the following Makefile
SHELL=/bin/bash
BAMS=$(shell ls -1 *.bam)
define run
$$(addsuffix .vcf,$$(basename $(1))): $(1)
samtools mpileup -u -f ref.fa $$< | bcftools call -vcg - > $$@
endef
all: $(addsuffix .vcf,$(basename ${BAMS}))
$(eval $(foreach B,${BAMS},$(call run,${B})))
run it in parallel using the option -j <jobs>
of make
make -j 16
Dear Pierre, I get an error with separators :
makefile:13: * missing separator. Stop.
For this input makefile:
SHELL=/bin/bash BAMS=$(shell ls -1 *.bam) define run $$(addsuffix .vcf,$$(basename $(1))): $(1) samtools mpileup -d 18000 -L 18000 -f /home/fil/Desktop/new_reference_hg38/hg38.fa $$< | java -jar /home/fil/VarScan.v2.3.9.jar mpileup2indel --min-coverage 10 --min-avg-qual 15 --min-reads2 2 --min-var-freq 0.1 --min-freq-for-hom 0.85 --strand-filter 1 --p-value 0.1 --output-vcf 1 > $$@ endef all: $(addsuffix .vcf,$(basename ${BAMS})) $(eval $(foreach B,${BAMS},$(call run,${B})))
Could you help me where could be a problem? I know about make file problem with tab del. But if I used
cat -e -t -v makefile
I did not see any problem. Thank you.
Peirre , can I have another question, please? Is it possible to run two makefiles in once time, because varscan calls separately snps and indels. If I tried to add another row to makefile with "samtools...... | varscan mpileup2snp..." second command was not parallelized.
I tired this, but doesn't work:
SHELL=/bin/bash BAMS=$(shell ls -1 *.bam) define run $$(addsuffix .vcf,$$(basename $(1))): $(1) samtools mpileup -d 18000 -L 18000 -f /home/fil/Desktop/new_reference_hg38/hg38.fa $$< | java -jar /home/fil/VarScan.v2.3.9.jar mpileup2snp --min-coverage 10 --min-avg-qual 25 --min-reads2 2 --min-var-freq 0.1 --min-freq-for-hom 0.85 --strand-filter 1 --p-value 0.05 --output-vcf 1 > $$@ samtools mpileup -d 18000 -L 18000 -f /home/fil/Desktop/new_reference_hg38/hg38.fa $$< | java -jar /home/fil/VarScan.v2.3.9.jar mpileup2indel --min-coverage 10 --min-avg-qual 25 --min-reads2 2 --min-var-freq 0.1 --min-freq-for-hom 0.85 --strand-filter 1 --p-value 0.05 --output-vcf 1 > $$@ endef all: $(addsuffix .vcf,$(basename ${BAMS})) $(eval $(foreach B,${BAMS},$(call run,${B})))
Yes, it is possible. One method would be using parallel, see F: Got Multiple Cpus? Activate Them With Gnu Parallel and GNU Parallel for mutlithreaded tasks for a primer.
bcbio is another option, it is a scalable pipeline for variant analysis.
This is an exome variant calling pipeline per sample to generate the bam and the SNP and indels VCF with GATK per sample, you can stop at the step of last bam file generation and use something like mutect2 or varscan for somatic calls and this can be easily parallelised with shell/bash. The workflow is here.
sorry, this is something you have to read from manuals as @Wouter said. Also, you should try to learn. for GATK now you have WDL which works on Cromwell engine. You can take a look at the GATK 3.7 and WDL to come up with your code for parallelization. If you have problems their forum will help you.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Variant calling tools, like almost all tools, can be parallized with GNU parallel. A little more information about which tool you are using would be helpful though.