Bwa And Gatk Runtimes
2
0
Entering edit mode
12.1 years ago
aniketd86 ▴ 150

I am trying to optimize a NGS pipeline for Human Genome sequence analysis. I would like to know the estimated runtimes for bwa and gatk. Could you also specify

  1. The data type eg - Human genomes at ~30-50x Coverage from illumina, preferably paired end.
  2. The Number of cores and nodes for bwa.
  3. The parameters used for bwa and gatk.

Also, is there a comparative analysis for bwa and gatk with respect to run time and optimisation? I recently came across a seqanswers thread that compares bwa and bowtie.

http://seqanswers.com/forums/showthread.php?t=15200

Thanks,
AD

gatk parallel bwa • 7.0k views
ADD COMMENT
0
Entering edit mode

According to the GATK-pipeline manual-page, GATK uses BWA for the alignment.

http://gatkforums.broadinstitute.org/discussion/41/data-processing-pipeline

This page lists "mapping" as non-GATK, so I assume there's no read-aligner inbuilt (I never use GATK anyway).

ADD REPLY
5
Entering edit mode
12.1 years ago

GATK does not perform the primary short read mapping step, and does recommend that you use BWA. My personal experience with mapping ~120x Illumina 150 bp paired end exomes on 4 cores with BWA, then using the v3 best practices from GATK, results in about 12 hours of compute time per exome. On a 12 core machine, I can run 3 simultaneously in about 12 Gb of RAM. There is a mapping step that uses SRMA to realign reads around indels, and these indels regions are estimated based on average insert size. Hope this helps.

ADD COMMENT
2
Entering edit mode
12.1 years ago
vdauwera ★ 1.2k

As Philipp said, GATK does not include a de novo aligner as such (there are realignment steps, but that's a completely different thing), so GATK and BWA cannot be compared -- they do very different things.

GATK can take the output of any aligner as long as it makes spec-compliant BAM files. At the Broad Institute, genomes are aligned with BWA, then processed with GATK and some Picard tools, using workflows illustrated here:

http://www.broadinstitute.org/gatk/about/#typical-workflows

Runtimes are going to vary massively depending on your hardware and how parallel you can go. The best way to evaluate it is to run some test data through your setup and see what happens. If you use GATK, you can parallelize using Queue as explained here:

http://www.broadinstitute.org/gatk/about/#high-performance

Good luck!

ADD COMMENT

Login before adding your answer.

Traffic: 1995 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6