Can someone explain why is it that as MAF decreases it is generally the case that imputation quality decreases?
Can someone explain why is it that as MAF decreases it is generally the case that imputation quality decreases?
Assume the very best case: a rare variant is present on one, and only one, parent haplotype.
Case 1: haplotype : 20%, variant : 1% (5% of parent haplotype). In this case, someone who has the parent haplotype only has a 5% of harboring the rare variant; this makes it quite difficult to impute on the basis of the parent haplotype.
Case 2: Haplotype: 5%, variant: 1% (25% of parent haplotype). In this case, someone who has the parent haplotype has a 25% chance of harboring the rare variant on that haplotype. If they are homozygous for the parent haplotype, then it is more likely than not that they received at least 1 copy of the rare variant.
Even in the best case, the ability to impute a rare variant is limited by the resolution of your imputation panel (how well catalogued low-frequency haplotypes are). Add to that that many rare variants are old, and therefore present on multiple haplotypes due to recombination, and the challenge of imputation becomes statistically limited very rapidly.
In other words, it's not enough to have a strong estimate of the rare variant allele frequency in the panel, you need to have a panel large enough to refine its parent haplotype (by including additional variation) to approximately the same frequency. This means in general having many, many more samples than needed merely for bounding the MAF.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.