Gatk - Haplotypecaller Is So Slow, What Is Faster And As Good?
3
7
Entering edit mode
11.3 years ago
newDNASeqer ▴ 790

I have 15 exome-seq samples, and have been using BWA-PiCard-GATK pipeline to do the variant calling. I did not realize GATK is so slow until I have to analyze this large number of samples. In this HaplotypeCaller step, each sample seems to take at least 2 days (48+ hours). Is this normal is there's something I did wrong? Below is my command, is there anything wrong or GATK-HaplotypeCaller is known this slooooow?

java -Xmx10g -Djava.awt.headless=true -jar /Library/Java/Extensions/GenomeAnalysisTK.jar \ -T HaplotypeCaller \ -minPruning 3 \ -dcov 10 \ -R ./GATK_ref/hg19.fasta \ -I ./GATK/BQSR/sample1_realign.recal.compressed.bam \ -o ./GATK/VQSR/sample1_realign.raw.snps_indels.vcf

What other variant calling program do you guys recommend? Can I use the above sample1_realign.recal.compressed.bam file (prepared by GATK procedures before HaplotypeCaller) for use with the program you recommend? Thank you

ps: GATK 2.5 is what I am using.

gatk • 17k views
ADD COMMENT
0
Entering edit mode

No other variant caller tool is as "good" as GATK HaplotypeCaller (written 2014-05-21) that I know. If someone finds a better tool, please reply here.

ADD REPLY
12
Entering edit mode
11.3 years ago

There are in fact three other local de novo variant callers

  • -Platypus (from Andy Rimmer and Gerton Lunter in Oxford)
  • SGA (from Jared Simpson at the Sanger Institute and now OICR)//
  • DISCOVAR, from David Jaffe's team at the Broad. //

None of these, nor the GATK Haplotype Caller, have yet published a paper describing their methods or performance, but I've heard good things of all 4 (Platypus, SGA and Haplotype Caller have been heavily tested and used in the 1000 Genomes Project), and believe papers are in progress.

There are also two global de novo variant callers,

  • Cortex (from me amongst others), published last year: De novo assembly and genotyping of variants using colored de Bruijn graphs. Z Iqbal, M Caccamo, I Turner, P Flicek, G McVean, Nature Genetics (2012)

  • Fermi from Heng Li, also published last year Exploring
    single-sample SNP and INDEL calling with whole-genome de novo
    assembly Heng Li, Bioinformatics

ADD COMMENT
2
Entering edit mode
ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Has GATK published a paper yet?

ADD REPLY
0
Entering edit mode
ADD REPLY
4
Entering edit mode
11.3 years ago
William ★ 5.3k

GATK haplotype caller is kind of slow because it does variant calling based on a sliding window local denovo assembly. That is also were the advantages of the haplotype caller com from. I don't know any other local denovo assembly based variant callers.

To make it run faster you can run it on a machine with a large number of cores or on a Sun Grid Engine cluster. You can use the GATK queue library together with a small scala script to start the haplotype caller on multiple cores, locally or on a cluster. The last version of GATK (2.6.5 ) is also much faster, but you need Java 1.7 to run that version of GATK.

If you don't want to do this or it is still to slow (what can happen with multi-sample calling on a large number of samples.) you can use the "old" GATK Unified Genotyper. It is much faster but lacks the advantages of doing a local denovo assembly.

ADD COMMENT
0
Entering edit mode

As Zam notes, there are a number of other methods for local denovo other than the GATK. Some of them have distinct advantages, and as I understand the method in the GATK is not exactly a haplotype caller in the sense that it only uses the windowed local assembly to generate candidate alleles. Haplotypes are then inferred post-hoc where linkage disequilibrium is greater than 0.95.

ADD REPLY
0
Entering edit mode

Erik - why does that mean the GATK HC is not exactly a haplotype caller??

ADD REPLY
0
Entering edit mode

It ultimately calls and reports point mutations, not haplotypes. The haplotype-based aspect of detection is driven by the debruijn assembly which is used to detect possible alleles.

ADD REPLY
2
Entering edit mode
8.1 years ago
daniel ▴ 30

To expand Zam's answer, we have just released an alpha version of Platypus' successor, octopus. By default octopus isn't much faster than GATK, but it does have an optional fast mode which gives similar runtimes to Platypus with little loss in calling accuracy. It also has built in multithreading.

ADD COMMENT
1
Entering edit mode

Link is dead. Any word on the octopus project?

ADD REPLY
0
Entering edit mode

Octopus is back online - the link should now work.

ADD REPLY

Login before adding your answer.

Traffic: 2065 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6