I have a VCF file, and I believe it is converted from PED file by PLINK, as illustrated in this blog:
There is one comment saying ##INFO=<ID=PR,Number=0,Type=Flag,Description="Provisional reference allele, may not be based on real reference genome">
in the VCF file.
For a some variant loci, the REF and ALT had been switched in the VCF file for unknown reason. For example, it should be G at locus 1234 in RefSeq, and the variant is T. But the VCF file records T(REF) and G(ALT).
I only have the VCF file, and do not have the original PED file. Is there any tool or method to check if the REF alleles are correct by RefSeq and switch REF and ALT columns (or just remove this loci) in the VCF if they're wrong?
Thanks!
I guess the solution should be like this way:
check the reference allele with genome reference (hg19.fa or hg38.fa)
if the reference allele is not same with the human genome, then check whether the alternative allele is same with human reference
if the alternative allele is same with human genome, then switch it. For the sample gentoype, 0 -> 1 and 1 -> 0
if neither reference and alternative allele is not same as reference, then .... remove it??