I have array genotype data that I am phasing with Eagle2. I want to be able to measure how well the phasing is going. I have trios in my population.
My question is mostly to check understanding and see if my thought on a workflow for determining phasing accuracy is valid. I am new to this.
From reading around, it seems that a common metric for assessing phasing quality is the switch error rate, or the number of switches that occur between a known maternal and paternal haplotypes.
I know there are tools/formulas available to perform this calculation by using a comparing a test VCF and some “ground truth” VCF
If I understand correctly, “trio phasing” methods that utilize Mendelian inheritance rules are theoretically able to perfectly phase a trio, except for at sites where all individuals of the trio are heterozygous. I believe the BEAGLE software does this.
My thought right now is to:
1. Use BEAGLE to generate a “ground truth” VCF for my trios.
2. Find switch error rate by comparing the “ground truth” VCF gotten from BEAGLE to my Eagle2 VCF.
3. When calculating switch error rate, ignore sites where all individuals of the trio are heterozygous.
Is this a reasonable approach to evaluating phasing accuracy of Eagle2 way off, if so is there a recommended alternative? Thank you.