Computing R Squared Statistics For Imputation Results - How?
2
4
Entering edit mode
11.1 years ago

I have two plink files, one containing the actual SNPs and one containing the result of imputing the same SNPs using a much smaller subset.

What would be the simplest way of getting the r^2 (r squared) scores?

plink has a merge mode that calculates a diff file, but I do not know if it outputs any statistics: http://pngu.mgh.harvard.edu/~purcell/plink/dataman.shtml#merge

The R package snpStats can read plink data, but it does not seem to be able to compare two SNP sets as far as I can see: http://www.bioconductor.org/packages/2.12/bioc/html/snpStats.html

There is a program called http://genome.sph.umich.edu/wiki/CalcMatch that seems to be able to calculate these stats, but require the files to be in merlin format which seems totally obscure and like a hassle converting to it in my pipeline. But might be the best option so far. A conversion script is here: http://www.pypedia.com/index.php/bioinformatics_format_convert

imputation • 8.6k views
ADD COMMENT
4
Entering edit mode
11.1 years ago
zx8754 12k

If typed SNPs are not in 0,1,2 (RAW) plink format, then convert them to raw format using --recodeA option. If the imputed SNPs are posterior probabilities(3 values per SNP) then convert them to dosage(one number ranged 0-2), e.g.: if SNPx for sampleY is 0.3, 0.6, 0.1, then to convert it to dosage: 0.3*0+0.6*1+0.1*2=0.8. Then just use cor to get correlation.

EDIT: Added solution Rscript with example files to GitHub.

ADD COMMENT
2
Entering edit mode
11.1 years ago

http://www.g3journal.org/content/1/6/457.full:

Once every individual in a panel had been masked and imputed, we assessed accuracy at each SNP as the squared Pearson correlation (R2) between the masked genotypes, which take values in {0,1,2}, and the imputed allele dosages (also known as posterior mean genotypes), which take values in [0,2]

So in my case wouldn't use Plink data, but the dosage files and just compute use the regular cor(x, y, method = c("pearson")) or something similar in NumPy

ADD COMMENT

Login before adding your answer.

Traffic: 2661 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6