Why GATK4 doesn't carry out local realignment around indels ?
2
0
Entering edit mode
2.9 years ago
ManuelDB ▴ 110

I am a trainee working with GATK. In the clinical lab where I am working, we use GATK3 and I am learning creating my own pipeline.

In the in-house pipeline we perform local realignment around indels by the tool RealignerTargetCreator and IndelRealigner. However, following the GATK4 best practices this step is not mentioned, why?

After marking duplications, GATK4 jumps to Base (Quality Score) Recalibration while in the in-house pipeline we mark duplication, local realignment around indels and then recalibrate.

GATK4 link https://gatk.broadinstitute.org/hc/en-us/articles/360035535912-Data-pre-processing-for-variant-discovery

GATK • 4.7k views
ADD COMMENT
1
Entering edit mode

There was a previous discussion regarding this: Realignment disappeared in gatk4

ADD REPLY
4
Entering edit mode
2.9 years ago
vdauwera ★ 1.2k

It's not a GATK 3 vs 4 difference. The indel realignment step was deprecated in 2016 (GATK version at the time was 3.6), when we validated that the realignment done by HaplotypeCaller made separate indel realignment unnecessary (as Pierre points out in his answer). So you can drop indel realignment from any pipeline using GATK 3.6 or later as long as you're using a haplotype-based caller (in GATK that's HaplotypeCaller for germline short variants and Mutect2 for somatic short variants).

This is the blog post that announced the change (from our blog archive): https://github.com/broadinstitute/gatk-docs/blob/master/blog-2012-to-2019/2016-06-21-Changing_workflows_around_calling_SNPs_and_indels.md?id=7847

You can find an up to date in-depth description of the GATK Best Practices pipelines, along with explanations of what each step does and hands-on exercises that demonstrate their usage, in the O'Reilly book "Genomics in the Cloud", which I coauthored with Brian O'Connor in 2020 (despite the title, most of the book applies whether you're working locally on a laptop, on an HPC, or on the cloud).

https://oreil.ly/genomics-cloud

ADD COMMENT
0
Entering edit mode

There is a remote possibility it might be coming back: https://github.com/broadinstitute/gatk/issues/3104

ADD REPLY
1
Entering edit mode

Not exactly -- that discussion is about someone potentially implementing the tools in GATK4, not actually bringing the step back into the Best Practices pipelines. The goal would be to enable people who have a very different use case and/or use a different variant caller that doesn't do realignment to be able to do it with GATK4. But there is zero chance of indel realignment coming back to the Best Practices for short variants.

ADD REPLY
0
Entering edit mode

Yes, I was just referring to having the tool be available at all, not necessarily as Best Practices. It would be great to have it as an option for non-standard workflows.

ADD REPLY
0
Entering edit mode

Oh I see, I misinterpreted your comment, sorry! I agree it would be nice to have for those cases.

ADD REPLY
0
Entering edit mode

Many thanks, Geraldine for your comment and thanks for the book recommendation. It's going to be a nice reading for this Christmas

ADD REPLY
1
Entering edit mode
2.9 years ago

because gatk4 HaplotypeCaller realigns the reads on the fly.

https://gatk.broadinstitute.org/hc/en-us/articles/360037225632-HaplotypeCaller

The program then realigns each haplotype against the reference haplotype using the Smith-Waterman algorithm in order to identify potentially variant sites.

ADD COMMENT

Login before adding your answer.

Traffic: 1904 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6