Entering edit mode
7.2 years ago
alessandrotestori7
▴
420
Hello! I'm using ImpG-Summary (Pasaniuc et al, 2014) to perform genotype imputation from summary statistics. However, for each imputed SNP, I only get a z-score, while odds ratio and standard error are missing. The output file has - for each SNP - six columns: name of SNP, position of SNP, Ref allele, Alt allele, Z-score, r2pred; the higher this latter value is, the more confident one is about the result. Can I approximate Odds Ratio, effect size, and standard error somehow? Please let me know. Thanks in advance!
Hey, you will have to provide more information, as one can only hypothesise about the specifics of your analysis based on the current information that you've provided. For example, which imputation program have you used and in which format is your data currently?
If you are interested in obtaining odds ratios and standard errors (between 2 conditions of interest, I assume?), then the basic association test of PLINK is a good starting point: http://zzz.bwh.harvard.edu/plink/anal.shtml
Thanks Kevin for your comment, I have updated my post with more information.
This is an interesting question but I do not believe that you have enough information such that you could calculate what you want. There was a similar question posted ~7 months ago: convert GWAS Zscore back to OR
If you had summary statistics prior to using ImpG-Summary, then what did these summary statistics contain?
I had summary statistics containing ref allele, alt allele, MAF in cases, MAF in controls, chisq, p-value, OR, L95, U95, ln(OR), SE of ln(OR)
I would contact the authors. Their emails are at the beginning of the manual: http://bogdan.bioinformatics.ucla.edu/wp-content/uploads/sites/3/2013/07/ImpG_v1.0_User_Manual_31July13.pdf
From the Z-score alone, it may be difficult to calculate the OR - there is too much information missing. The formula is something like:
ln is the natural log
OR StdErr is the standard error of the log odds, calculated as:
I may try to work through this formula later on a scrap of paper.
One of the authors actually states that the imputed Z-score divided by the square root of the sample size yields the effect size (beta) under the standardized scale (i.e. when both phenotype and genotype are standardized to have mean 0 and variance 1). I already contacted them through their github repository: https://github.com/huwenboshi/ImpG/issues issue #5. Similarly in Pasaniuc & Price, 2017 (Nature Reviews Genetics)
Okay, let me know what they say. I'd be interested.
It looks like we could calculate the ORs from the Z-scores but we would be making a few assumptions along the way without direct evidence.
One thing I noticed is that in my dataset - being sample size fixed for all SNPs - SE of ln(OR) and allele frequency are well correlated, so that it is easy to approximate SE for various intervals of allele frequency. For example, all SNPs with allele frequency between 0.40 and 0.41 tend to have the same SE. I can thus obtain SE for different ranges of allele frequency starting from typed SNPs and assign it to imputed SNPs. ln(OR) is then easy to compute : z-score*SE. I guess SE is mainly a function of sample size and allele frequency
There is a paper mentioned conversion of z-statistics back to effect size: https://www.ncbi.nlm.nih.gov/pubmed/27019110. See the online method part. Basically,
beta=z/sqrt(2p(1-p)(n+sq(z))
, it requires allele frequency as well as sample size. Hopefully, this might helpSure it helps! Thanks a lot!