Question

GATK GenotypeGVCFs 8 weeks runtime

3

Entering edit mode

9.2 years ago

esticca ▴ 20

I'm working on the last step of our lab's well established variant calling pipeline, running GATK GenotypeGVCFs on 4392 whole exome sequenced individuals. In the past I haven't had any problems with this sort of thing, but on this last run the job would be killed on the supercomputer cluster for using too much memory. Now it appears that even with allocating 16 threads and 64 GB of memory the log file predicts nearly 8 weeks of runtime remaining! I am using GATK 3.3 with the following arguments:

-T GenotypeGVCFs -R /projects/resources/Homo_sapiens_assembly19.fasta --variant /projects/combinedgvcfs/combined_gvcfs.list --dbsnp /projects/resources/gatk_bundle/dbsnp_138.b37.vcf -o /04_15_2016/genotype_gvcfs/04_15_2016_raw.vcf -log /04_15_2016/genotype_gvcfs/04_15_2016_raw.log -L /projects/resources/bed.and.interval.files/b37_refseqplus50_clean.bed -nt 16 --max_alternate_alleles 6

If anyone has any ideas please let me know because in the past this would take no more than 72 hours to run to completion. Let me know if I can provide any additional information to help.

GATK GenotypeGVCFs variant calling genotyping • 6.1k views

ADD COMMENT • link updated 9.2 years ago by John 13k • written 9.2 years ago by esticca ▴ 20

0

Entering edit mode

Maybe merge them first: http://gatkforums.broadinstitute.org/gatk/discussion/6312/merge-gvcf-files

ADD REPLY • link 9.2 years ago by Zaag ▴ 870

0

Entering edit mode

Zaag,

Perhaps I should have specified, but these 4392 samples have been combined into 29 gvcfs corresponding to each sequencing run they were a part of. If you think that combining them further would aid performance I'm willing to give it a try.

Regards, Evan

ADD REPLY • link 9.2 years ago by esticca ▴ 20

0

Entering edit mode

No that seems to be enough combining.

ADD REPLY • link 9.2 years ago by Zaag ▴ 870

score 1 · Answer 1 · 2016-04-19

Sounds like last time you exceeded both the memory and swap space, where as this time you only exceeded the memory, and you're work is being bottleneck'd at disk IO speeds which is probably significantly slower. If htop says that the computer's memory is maxed and theres something significant in the swap, that's almost certainly it. However, if you have no swap space (which is typical for a compute server to prevent exactly this sort of thing), then I have no idea -- but when you find out please let us know because that's going to be interesting :)