Hello,
I have conducted a large-scale GWAS study and got a few significantly associated SNPs. I used GEMMA
with -lmm 1
options to run the GWAS and obtain the beta
and standard-error
estimates. I want to estimate the percent phenotypic variation explained by each of the significant SNPs. I used the following procedure for estimating the variance explained in R:
fit <- lm (Phenotypic_value ~ SNP_data, data = a)
summary(fit)$r.squared
Here, the datafile a
contains three columns namely, sample_ID
, Phenotypic_value
for each sample, and the biallelic SNP_data
. I got a value which is 0.43 meaning 43% phenotypic variation explained by the SNP.
Again, I used another formula which is: 2*f*(1-f)*b.alt^2
. Here, f
is the minor allele frequency and b.alt
is the effect size i.e. beta
estimate obtained from GEMMA
. This gives me a value of 0.03 meaning 3% variation explained which seems reasonable to me.
My question is that which of the following method is correct? or Is there any other way to estimate the percent variation explained?
Alternatively, from the GEMMA google group, I have got this formula pve <- var(x) * (beta^2 + se^2)/var(y)
. But I do not understand how can I obtain the value of var(x)
and var(y)
.
It will be great to receive some feedback on this. Thank you.
In your case:
In linear regression involving no covariates (y=alpha+beta*x+e), the correlation coefficient between x and y can be expressed as
and then you want to take the square of this. I am not sure where the se^2 term comes from, but I see the author of GEMMA won't back up his claim. Generate some fake data in R and you'll see the formula is wrong and the se^2 does not belong there (for simple regression). There's no reason why an estimate having a higher se would explain a higher % of the variance. Maybe it has to do with the fact that GEMMA is a LMM, I don't know I am not familiar enough.
Since
your other formula is equivalent only if your y has unit variance.
Hi @Lemire Ok, so the correct formula is then
pve <-sqrt(var(x))*beta/sqrt(var(y))
and thenpve^2
where var(x) is2*f*(1-f)
?Do you have any reference sources for that?
Thank you very much.
What about the second formula? Is it correct this way?