I am using the michigan imputation to perform phasing and imputation. Under the hood, the server uses minimac4 to perform the imputation step. The imputation accuracy of each SNP is described by a R-square value.
I am having a hard time understanding how such a accuracy can be calculated. I understand that it is straight forward to calculate the accuracy at sites which were actually genotyped, but I do not know how to do it for genotyped which were "purely" imputed. I found on the homepage of minimac3 that they describe the R2 value as follows:
This is the estimated value of the squared correlation between imputed genotypes and true, unobserved genotypes. Since true genotypes are not available, this calculation is based on the idea that poorly imputed genotype counts will shrink towards their expectations based on population allele frequencies alone; specifically 2p where p is the frequency of the allele being imputed.
Can somebody explain to me what exactly that means and how this helps? Maybe with an example?
Any help is much appreciated!