Before I begin coding, does a tool already exist that allows you to easily switch reference alleles in a large VCF file (~400K variants) based on a reference genome, re-encoding all the genotypes properly?
We have a large amount of legacy data in PLINK format that we would like to use with some of the modules in GATK's VariantEval method to compare with whole exome data. I tried converting the PLINK data to VCF using PLINK v1.08. However, it does not have a mechanism for specifying the reference allele, and the output did not match our sequencing files.
This is not an answer. Please add this as a comment to the original question. Thank you.