Question

Seeking Advice on Multisample Variant Calling Pipeline with BWA-MEME and GATK DRAGEN Mode

0

Entering edit mode

9 months ago

George ▴ 10

Hello everyone,

I am in the process of building a pipeline for multisample variant calling and have some questions regarding tool usage and their safety. Specifically, I would like to discuss mapping and variant calling.

Mapping:

I have mapped my reads using BWA-MEME (not BWA-MEM) from this repository: BWA-MEME GitHub.

Variant Calling:

For variant calling, I am using GATK HaplotypeCaller with DRAGEN mode enabled. Afterward, I build a GenomicsDB and consolidate the GVCFs. I follow GATK's best practices, assuming that BWA-MEME produces similar outputs to BWA-MEM (BWA-MEM2). Here is the reference I use for the DRAGEN pipeline: GATK DRAGEN Mode Guide.

Questions:

Do you have any opinions or experiences with these tools?
Is it problematic to use DRAGEN mode? From my understanding, it performs better, at least before the merging (GATK-DRAGEN).

I found some literature that supports the use of these tools: Performance Evaluation of Variant Calling Tools.

I would greatly appreciate any insights or advice on this matter.

Thank you!! pipeline

GATK Variant-Calling dragen bwa • 928 views

ADD COMMENT • link 9 months ago by George ▴ 10

1

Entering edit mode

I wonder if GATK+Dragen is always under active development ....

ADD REPLY • link 9 months ago by Pierre Lindenbaum 166k

0

Entering edit mode

Haha, honestly, with all due respect to the GATK developers, it feels like GATK is always in some form of beta... and let's not even get started on Spark. Still, despite all the bugs and troubleshooting, I believe the GATK "suite" performs better in most cases.

ADD REPLY • link 9 months ago by George ▴ 10

score 3 · Accepted Answer · 2024-07-23

3

Entering edit mode

9 months ago

DBScan ▴ 480

Good to see people actually read my paper :). To your questions:

BWA-MEME should produce identical output to BWA-MEM. Running GATK in DRAGEN mode seems like a good idea, if you don't have too many samples or you have a beefy HPC available, you can also run variant calling with DeepVariant and then combine your VCFs with GLNexus.
I've never used GATK in DRAGEN mode except for the benchmark, but from what you also observed it performs better than standard GATK HaplotypeCaller.

I would say these days the accuracy of you variant calls is more dependent on the mapping and alignment than on variant calling itself. Especially the pangenome approach seems like a good idea to improve your accuracy.

ADD COMMENT • link 9 months ago by DBScan ▴ 480

0

Entering edit mode

First of all thank you for your response and your excellent work!

I will try merging the VCF with the DeepVariant results.
Unfortunately, since I am working with non-human samples, a pangenome is not yet available. Additionally, I need to compare the results with a database that was built using an older reference.

ADD REPLY • link 9 months ago by George ▴ 10