Since I am new to GWAS and statistics, I find it hard to comprehend the interpretation of a beta and SE value in a typical GWAS ouput. While with the pvalue it makes sense that below a threshold level its means interesting. How can interpret the value of Beta and SE.
In general, beta denotes the resulting coefficient from a fit and SE would be its standard error. Assuming that's about as clear as mud to you, let's restate that using statistics you're probably more familiar with...a T-test.
Suppose you have two experimental groups (we'll use human males and females) and perform a measurement on them (in this case, we'll just measure their height). If you were to graph the results you'd probably see that the males tend to be a bit taller than the females. If you calculated the mean of each group and subtracted them, then the result would be the expected difference in height due to gender. The is a simple example of a beta value. But of course unless you measured the height from ALL of the males and females in the world, then this isn't an exact value (even ignoring measurement error). Rather, since we only measured a subset of all people there's some error associated due to our sampling. This ends up becoming the standard error of the measurement. In the case of a T-Test, you can divide the beta value by the standard error and you have your T-statistic, which you would then use to find a p-value.
The methods used to get the beta and SE values are rather more complicated for GWAS, of course, but the underlying principles are the same. So as with the height example, the beta value and its error give you an idea of the effect size. A p-value is nice, but you also want to know if it's associated with a small but very consistent (and if it's really really small, do you even care about it?) or large but highly variable effect.
thanks,.. so say in case the minor allele is associated with a negative beta value .. then it means that the allele is associated with the lower phenotype value - so is protective.!! and vise versa for the positive beta value.. thanks
And one more thing.. how to interpret the value of the beta say one beta has -0.004 and the other with 74.8.. sorry for getting in to details..just curious
If that's how the design was setup then yes. Normally when one sets up a test then it's designed such that the beta is negative if it's protective against the phenotype of interest (normally "diseased" or something like that)...though if you reverse the levels then the opposite becomes true (always keep track of the base level of comparisons!).
And one more thing.. how to interpret the value of the beta say one beta has -0.004 and the other with 74.8.. sorry for getting in to details..just curious
Just as with the height example, one suggests that the effect (whether significant or not) is very small while the other is large. I don't know the scales in this case, but -0.004 is close enough to 0 that my guess would be that it's not biologically relevant.
BTW, make sure to adjust your p-values for multiple comparisons.
So is the following a good reflection of this? I'm interested in understanding the relationship between P-value and standard error, why genomic prediction methods like PRS don't take the standard error into account and use only the effect sizes from the top SNPs.
SE = sqrt[SD1/n1 + SD2/n2]
P-value ≈ [Effect size/SE]
SE depends on N; the bigger the N the smaller the SE
P-value depends on SE and effect size; the bigger the SE the higher the P-value
Thus, P-value depends on N; the bigger the N the lower the P-value
Therefore, we can say in a GWAS that a lower P-value indicates a smaller SE or a higher effect size
As most GWAS significant hits have a small effect size, the top ones tend to have have small SE
**A high SE could be reflective of low N, not just high variability (although N is constant for GWAS, for each SNP it changes so rarer snps have higher p values because of high SE)
thanks,.. so say in case the minor allele is associated with a negative beta value .. then it means that the allele is associated with the lower phenotype value - so is protective.!! and vise versa for the positive beta value.. thanks
And one more thing.. how to interpret the value of the beta say one beta has -0.004 and the other with 74.8.. sorry for getting in to details..just curious
If that's how the design was setup then yes. Normally when one sets up a test then it's designed such that the beta is negative if it's protective against the phenotype of interest (normally "diseased" or something like that)...though if you reverse the levels then the opposite becomes true (always keep track of the base level of comparisons!).
And one more thing.. how to interpret the value of the beta say one beta has -0.004 and the other with 74.8.. sorry for getting in to details..just curious
Just as with the height example, one suggests that the effect (whether significant or not) is very small while the other is large. I don't know the scales in this case, but -0.004 is close enough to 0 that my guess would be that it's not biologically relevant.
BTW, make sure to adjust your p-values for multiple comparisons.
Thanks and so kind of you to get down to the details....
So is the following a good reflection of this? I'm interested in understanding the relationship between P-value and standard error, why genomic prediction methods like PRS don't take the standard error into account and use only the effect sizes from the top SNPs.
SE = sqrt[SD1/n1 + SD2/n2]
P-value ≈ [Effect size/SE]
SE depends on N; the bigger the N the smaller the SE
P-value depends on SE and effect size; the bigger the SE the higher the P-value
Thus, P-value depends on N; the bigger the N the lower the P-value
Therefore, we can say in a GWAS that a lower P-value indicates a smaller SE or a higher effect size
As most GWAS significant hits have a small effect size, the top ones tend to have have small SE
**A high SE could be reflective of low N, not just high variability (although N is constant for GWAS, for each SNP it changes so rarer snps have higher p values because of high SE)