Is there a easy to use GATK pipeline for SNP calling?
3
8
Entering edit mode
6.7 years ago
Chen Sun ★ 1.1k

With full respect, GATK is a good tool for SNP calling. But the tutorial on GATK website is too complex, I get lost in the details.

Is there an easy to use a list of GATK commands for SNP calling? That I can copy and paste, with changing of just input file names, and maybe few parameters?

GATK • 15k views
ADD COMMENT
0
Entering edit mode

Hello,

what's the problem with the Tool Documentation?

I guess you tried to go through the best practice guide and get lost somewhere there? For the beginning it's ok to start with just the command for a VariantCall using HaplotypeCaller. But I would recommend to read more about the the whole "pipeline thing" (Not only the best practice guide, but that's a good starting pointing). Depending on what you try to analyse, there is much more to do than just hack in the command for a VariantCall.

Please feel free to ask a specific question if you don't understand a certain point.

fin swimmer

ADD REPLY
0
Entering edit mode

Hi swimmer, thank you very much for the suggestions. Do you think https://gencore.bio.nyu.edu/variant-calling-pipeline/ is a good command pipeline that I can follow? This is the kind of pipeline I am looking for, but I am not sure if they miss something important.

ADD REPLY
4
Entering edit mode

That pipeline might be a bit old. I don't think you need to do the realignment target creater/realign for indel anymore as haplotypeCaller will do that now.

The general steps for me are:

  1. trim reads
  2. bwa mem align to genome
  3. mark duplicates
  4. use HaplotypeCaller to generate gvcf
  5. CombineGVCFs
  6. GenotypeGVCFs on the combined gvcf
  7. filter your vcf however you want
  8. You can do base recalibration iteratively now if you want with the filtered vcf.

And yes, their tutorials are a bit of a mess. Their best practice guide is organized badly. You have to dig around alot.

ADD REPLY
0
Entering edit mode

Hi, Damian! I like the steps you mentioned a lot - I've just looked for something like this. Under "mark duplicates" did you mean to mark the duplicated reads using MarkDuplicates (Picard)? And should be duplicated/recombinant regions be removed from the reference as well, or it happens naturally when MarkDuplicates work?

ADD REPLY
1
Entering edit mode

Yes, I usually just use picardtools' MarkDuplicates. Duplicate/recombinant regions are tricky to deal with. It might be better to do some kind of de novo assembly of those regions specifically if that's what you want to study.

ADD REPLY
0
Entering edit mode

Dear Damian~ I'm trying to understand best practices for variant calling. Following alignment and marking duplicates, does each individual need to have variants called before calling variants across multiple samples (step 6)?

ADD REPLY
0
Entering edit mode

GATK best practices suggest creating a genome VCF (g.vcf) for each individual, combining the g.vcfs and then doing a joint-calling. This is step 4,5,6 in my comment.

A genome VCF is different from a normal VCF in that it will also output information on positions that are not different from the reference. You want this information when you eventually do a joint-calling among all samples so you can make the comparison with other samples where there is a difference to reference at that position. I would read up on g.vcfs if you want more info.

ADD REPLY
0
Entering edit mode

Which command did you use for GATK to call variants: I tried a lot but not able to generate vcf file:

Here is my commands:

./gatk --java-options "-Xmx4g" HaplotypeCaller -R ../sequence.fasta -I ../trt.bam -O ../output.vcf.gz

I also tried command:

./gatk HaplotypeCaller -R ../sequence.fasta -I ../sorted-trt.bam -O ../variants-trt.vcf

ADD REPLY
1
Entering edit mode

Hi Chen, the pipeline that you mentioned by NYU seems fine. It appears to be mostly for internal use, though. You don't appear to be based at NYU...?

ADD REPLY
1
Entering edit mode

An updated version of this pipeline using GATK4 is now available here: https://gencore.bio.nyu.edu/variant-calling-pipeline-gatk4/

It's available as a Nextflow script on github and fully dockerized so anyone outside of NYU can now use this same pipeline

ADD REPLY
0
Entering edit mode

I agree - It would be awesome if GATK could be used through a front-end application or were more user-friendly!

ADD REPLY
1
Entering edit mode
5.9 years ago

We have wrote and provide an open source software to do exactly what you want. You can find it here:

https://github.com/frankMusacchia/VarGenius

But you can run it only into a cluster

Regards

ADD COMMENT
1
Entering edit mode
5.7 years ago

There is an easy to use reproducible Snakemake workflow: https://github.com/snakemake-workflows/dna-seq-gatk-variant-calling

ADD COMMENT

Login before adding your answer.

Traffic: 2132 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6