The accuracy of NGS SNP calls is usually determined by calculating concordance between these NGS SNP calls and the SNP array data (which is known to call the true SNPs).
Are there any tools out there that calculate this concordance ?
Any thoughts and suggestions would be much appreciated.
Have a look at VariantEval in the GATK, in particular using the -goldStandard argument. You can provide a VCF containing your SNP array genotypes as the gold standard to compare your sequencing derived VCF to.
As far as I am aware, there is nothing "off the shelf" that could do this.
The way I am dealing with a similar issue is to get all our SNP data into a single database (MySQL) in a common format that I have devised using a Perl script.
Using another Perl script, it is possible to loop through every position in turn to check the genotype at the current position. Whatever the result, I've then output this into another table of the database which we can sort through later.
I have also added in another level of complexity by comparing the same position across different samples, before deciding whether the SNP could be considered true.
Thanks! I looked all over for a solution to this simple problem. For me putting my two VCF files as the
--comp
and--eval
parameters did the trick.