Seeking Advice on Multisample Variant Calling Pipeline with BWA-MEME and GATK DRAGEN Mode
1
0
Entering edit mode
6 weeks ago
George ▴ 10

Hello everyone,

I am in the process of building a pipeline for multisample variant calling and have some questions regarding tool usage and their safety. Specifically, I would like to discuss mapping and variant calling.

Mapping:

I have mapped my reads using BWA-MEME (not BWA-MEM) from this repository: BWA-MEME GitHub.

Variant Calling:

For variant calling, I am using GATK HaplotypeCaller with DRAGEN mode enabled. Afterward, I build a GenomicsDB and consolidate the GVCFs. I follow GATK's best practices, assuming that BWA-MEME produces similar outputs to BWA-MEM (BWA-MEM2). Here is the reference I use for the DRAGEN pipeline: GATK DRAGEN Mode Guide.

Questions:

  1. Do you have any opinions or experiences with these tools?
  2. Is it problematic to use DRAGEN mode? From my understanding, it performs better, at least before the merging (GATK-DRAGEN).

I found some literature that supports the use of these tools: Performance Evaluation of Variant Calling Tools.

I would greatly appreciate any insights or advice on this matter.

Thank you!! pipeline

GATK Variant-Calling dragen bwa • 428 views
ADD COMMENT
1
Entering edit mode

I wonder if GATK+Dragen is always under active development ....

ADD REPLY
0
Entering edit mode

Haha, honestly, with all due respect to the GATK developers, it feels like GATK is always in some form of beta... and let's not even get started on Spark. Still, despite all the bugs and troubleshooting, I believe the GATK "suite" performs better in most cases.

ADD REPLY
3
Entering edit mode
6 weeks ago
DBScan ▴ 440

Good to see people actually read my paper :). To your questions:

  1. BWA-MEME should produce identical output to BWA-MEM. Running GATK in DRAGEN mode seems like a good idea, if you don't have too many samples or you have a beefy HPC available, you can also run variant calling with DeepVariant and then combine your VCFs with GLNexus.
  2. I've never used GATK in DRAGEN mode except for the benchmark, but from what you also observed it performs better than standard GATK HaplotypeCaller.

I would say these days the accuracy of you variant calls is more dependent on the mapping and alignment than on variant calling itself. Especially the pangenome approach seems like a good idea to improve your accuracy.

ADD COMMENT
0
Entering edit mode

First of all thank you for your response and your excellent work!

  1. I will try merging the VCF with the DeepVariant results.
  2. Unfortunately, since I am working with non-human samples, a pangenome is not yet available. Additionally, I need to compare the results with a database that was built using an older reference.
ADD REPLY

Login before adding your answer.

Traffic: 950 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6