Let's say 2 variants are genotyped in 100 people. These variants are close together, within 2000bp, and the genotypes are not phased.
Let's say that the following genotype combos are present:
1(1/1, 0/0)
1(0/0, 1/1)
3(0/1, 1/1)
3(0/1, 0/0)
5(0/0, 0/0)
70(1/1, 1/1)
17(0/1, 0/1)
When phasing these two variants together, the only real problem occurs with the 17 heterozygote-heterozygotes. In all the other cases, we know if the ALT allele 1 and REF allele 0 are on the same strand, because there's no other option. We know that in the cases where both an alternate and reference allele are present, there are 14 cases of REF and ALT alleles being on the same strand, 4 of them in homozygous form. And there are 6 instances of REF or ALT alleles being at both positions on a strand.
Despite the 75 cases of homozygote-homozygote, it seems that some of the 17 heterozygote-heterozygotes should have (both REFs on one strand and both ALTs on the other), and others should have a REF and ALT on both strands.
However, I tried phasing with BEAGLE 4.1 many times, with a variety of parameters and I keep getting the same result - all cases have both REFs on one strand and both ALTs on the other. The only way to change this is to move the window size down to 5, but that does not seem right at all, and the results don't match the expected phenotype data.
Any suggestions?