GATK vs more traditional SNP and alignment tools
3
1
Entering edit mode
10.0 years ago

I've been asked to design an easy-to-use SNP caller at work (presumably for staff who don't know how to use a linux environment and would like the avoid the hassle of such). I've gone about doing this with some fairly traditional tools: bowtie2 for alignment, samtools and bcftools to modify sam files and generate pileups, SNVer for variant calling, etc.

And then I started reading about platforms like GATK that already do this, and thought that it might be better to investigate that as an option instead.

So, now I'm wondering: for those of you who have used GATK, do you prefer it to more 'traditional' alignment and variant calling methods (i.e., ones where you've written and customized most of the script yourself)? Are there any drawbacks to GATK that I should be aware of before investigating it as a primary alignment+variant calling tool? (I realize that GATK works primarily on a linux env, but I shouldn't have a problem creating an external GUI to be able to control some of its features.)

Any feedback would be great! Thank you.

SNP alignment • 4.1k views
ADD COMMENT
4
Entering edit mode

Why not just setup a galaxy pipeline for them? BTW, if you want real data on variant caller comparisons, have a read through Brad Chapman's blog.

ADD REPLY
0
Entering edit mode
Yup, it seems like there is only a need for a GUI, not for a new variant caller.
ADD REPLY
4
Entering edit mode
10.0 years ago

GATK is a set of tools for working with sequencing data. It does NOT include an aligner. It does include several tools (along with picard) for post-processing BAM files prior to variant calling. It includes two variant callers, UnifiedGenotyper and HaplotypeCaller, with HaplotypeCaller being the recommended one. Finally, GATK includes tools for post-processing VCF files. Many groups routinely use parts of GATK/picard in their pipelines, so you should definitely investigate it and probably incorporate parts of it into your pipeline.

As Devon suggests, it is probably a good idea to get a sense of the validity of various pipelines from the literature, blogs, and any other resources you can get your hands on. However, at the end-of-the-day, there is not a one-size-fits-all solution, so you'll need to define what your goals are (ease-of-use, validity, speed, sample size, etc.) and then define a pipeline that meets those goals.

ADD COMMENT
0
Entering edit mode

Thanks! I missed that GATK can't do general alignment. The remaining description helps a lot :)

ADD REPLY
0
Entering edit mode
HaplotypeCaller should only be used for high coverage data. Use UnifiedGenotyper, samtools or FreeBayes for low coverage SNP calling instead.
ADD REPLY
0
Entering edit mode
10.0 years ago

I agree that you should try to use GATK for variant calling. If it helps, here is a paper with some benchmarks (applied to targeted sequencing experiments):

https://peerj.com/articles/600/

There are also other published benchmarks, and you can use the citations from that paper to help find some of them.

ADD COMMENT
0
Entering edit mode
10.0 years ago

The big advantage of GATK is that it does recalibration and realignment but you don't have to use its caller. If you work with non-human data then you can use something "traditional" like sam/bcftools for variant calling: GATK callers are more trained for the human data if I understand it right.

ADD COMMENT

Login before adding your answer.

Traffic: 1569 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6