Question

Is there a easy to use GATK pipeline for SNP calling?

8

Entering edit mode

6.6 years ago

Chen Sun ★ 1.1k

With full respect, GATK is a good tool for SNP calling. But the tutorial on GATK website is too complex, I get lost in the details.

Is there an easy to use a list of GATK commands for SNP calling? That I can copy and paste, with changing of just input file names, and maybe few parameters?

GATK • 15k views

ADD COMMENT • link updated 4.0 years ago by ashotmarg2004 ▴ 130 • written 6.6 years ago by Chen Sun ★ 1.1k

0

Entering edit mode

Hello,

what's the problem with the Tool Documentation?

I guess you tried to go through the best practice guide and get lost somewhere there? For the beginning it's ok to start with just the command for a VariantCall using HaplotypeCaller. But I would recommend to read more about the the whole "pipeline thing" (Not only the best practice guide, but that's a good starting pointing). Depending on what you try to analyse, there is much more to do than just hack in the command for a VariantCall.

Please feel free to ask a specific question if you don't understand a certain point.

fin swimmer

ADD REPLY • link 6.6 years ago by finswimmer 16k

0

Entering edit mode

Hi swimmer, thank you very much for the suggestions. Do you think https://gencore.bio.nyu.edu/variant-calling-pipeline/ is a good command pipeline that I can follow? This is the kind of pipeline I am looking for, but I am not sure if they miss something important.

ADD REPLY • link 6.6 years ago by Chen Sun ★ 1.1k

4

Entering edit mode

That pipeline might be a bit old. I don't think you need to do the realignment target creater/realign for indel anymore as haplotypeCaller will do that now.

The general steps for me are:

trim reads
bwa mem align to genome
mark duplicates
use HaplotypeCaller to generate gvcf
CombineGVCFs
GenotypeGVCFs on the combined gvcf
filter your vcf however you want
You can do base recalibration iteratively now if you want with the filtered vcf.

And yes, their tutorials are a bit of a mess. Their best practice guide is organized badly. You have to dig around alot.

ADD REPLY • link 6.6 years ago by Damian Kao 16k

0

Entering edit mode

Hi, Damian! I like the steps you mentioned a lot - I've just looked for something like this. Under "mark duplicates" did you mean to mark the duplicated reads using MarkDuplicates (Picard)? And should be duplicated/recombinant regions be removed from the reference as well, or it happens naturally when MarkDuplicates work?

ADD REPLY • link 6.3 years ago by lutra007 • 0

1

Entering edit mode

Yes, I usually just use picardtools' MarkDuplicates. Duplicate/recombinant regions are tricky to deal with. It might be better to do some kind of de novo assembly of those regions specifically if that's what you want to study.

ADD REPLY • link 6.3 years ago by Damian Kao 16k

0

Entering edit mode

Dear Damian~ I'm trying to understand best practices for variant calling. Following alignment and marking duplicates, does each individual need to have variants called before calling variants across multiple samples (step 6)?

ADD REPLY • link 6.3 years ago by emilyepuckett • 0

0

Entering edit mode

GATK best practices suggest creating a genome VCF (g.vcf) for each individual, combining the g.vcfs and then doing a joint-calling. This is step 4,5,6 in my comment.

A genome VCF is different from a normal VCF in that it will also output information on positions that are not different from the reference. You want this information when you eventually do a joint-calling among all samples so you can make the comparison with other samples where there is a difference to reference at that position. I would read up on g.vcfs if you want more info.

ADD REPLY • link 6.3 years ago by Damian Kao 16k

0

Entering edit mode

Which command did you use for GATK to call variants: I tried a lot but not able to generate vcf file:

Here is my commands:

./gatk --java-options "-Xmx4g" HaplotypeCaller -R ../sequence.fasta -I ../trt.bam -O ../output.vcf.gz

I also tried command:

./gatk HaplotypeCaller -R ../sequence.fasta -I ../sorted-trt.bam -O ../variants-trt.vcf

ADD REPLY • link 4.5 years ago by Kumar ▴ 170

1

Entering edit mode

Hi Chen, the pipeline that you mentioned by NYU seems fine. It appears to be mostly for internal use, though. You don't appear to be based at NYU...?

ADD REPLY • link 6.2 years ago by Kevin Blighe 88k

1

Entering edit mode

An updated version of this pipeline using GATK4 is now available here: https://gencore.bio.nyu.edu/variant-calling-pipeline-gatk4/

It's available as a Nextflow script on github and fully dockerized so anyone outside of NYU can now use this same pipeline

ADD REPLY • link 4.6 years ago by mk5636 ▴ 10

0

Entering edit mode

I agree - It would be awesome if GATK could be used through a front-end application or were more user-friendly!

ADD REPLY • link 6.5 years ago by gaelgarcia ▴ 270

score 2 · Answer 1 · 2020-12-03

2

Entering edit mode

4.0 years ago

ashotmarg2004 ▴ 130

Maybe check out e.g. these: https://evodify.com/gatk-in-non-model-organism/ https://learn.gencore.bio.nyu.edu/variant-calling/variant-discovery/

ADD COMMENT • link 4.0 years ago by ashotmarg2004 ▴ 130

score 1 · Answer 2 · 2019-01-18

1

Entering edit mode

5.8 years ago

francescomusacchia ▴ 70

We have wrote and provide an open source software to do exactly what you want. You can find it here:

https://github.com/frankMusacchia/VarGenius

But you can run it only into a cluster

Regards

ADD COMMENT • link 5.8 years ago by francescomusacchia ▴ 70

score 1 · Answer 3 · 2019-04-05

1

Entering edit mode

5.6 years ago

johannes.koester ▴ 20

There is an easy to use reproducible Snakemake workflow: https://github.com/snakemake-workflows/dna-seq-gatk-variant-calling

ADD COMMENT • link 5.6 years ago by johannes.koester ▴ 20