Hi!
I would like to perform a simple variant analysis (SNP) with multi samples. I use to used the GATK workflow before but I would like to know if there is anything "better" than GATK nowaday, or is it always the gold standard ?
Thanks
Hi!
I would like to perform a simple variant analysis (SNP) with multi samples. I use to used the GATK workflow before but I would like to know if there is anything "better" than GATK nowaday, or is it always the gold standard ?
Thanks
GATK hasn't changed a great deal in terms of math and calculations between 3.8 and 4, however there's been some significant enhancements under the hood in terms of speed and data structures. Variant calling is still a hotly contested topic, specifically filtering around what is truly variation and what isn't - see the blog post here which gives a nice comparison between Google's DeepVariant, GATK's VQSR methodology and GATK's still in development CNN. All in all, it's still an exciting area of development to keep an eye on.
So to get to the root of your question, I still feel that GATK's variant calling methodology is the gold standard to go off, but it certainly doesn't hurt to compare and contrast methodologies. I'd suggest you look at samtools / bcftools for an alternative approach which can sometimes be a great help when working with non-model organisms.
To extend Andrew's answer, I was heavily favouring GATK for many years but their pipeline became overly restrictive (inflexible). So, like you, I sought alternatives.
samtools / bcftools is a very simple pipeline but this fact, ironically, is its advantage. samtools / bcftools is excellent for identifying SNVs; however, not good for indels, in which case I would call these with pindel.
The final point: no variant caller can completely mitigate the error that comes with using a sub-standard (but rapid) sequencing technology like NGS.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Mods - Probably worth turning this into a forum post?
Good suggestion, done.