Gwas With Covariates From Principle Component Analysis
1
1
Entering edit mode
11.1 years ago
eitan177 ▴ 20

Hi all,

I am trying to decide how/whether to incorporate principle components into association tests in a gwas. My question is: Is it unwise to standardize a principle component (calculated from snp data) that will be used as a covariate in a linear regression (I suspect this is not ok)? I realize this is a somewhat vague question, although I'm hoping someone who has performed gwas with principal components as covariates will have some insight, thanks.

gwas pca • 10.0k views
ADD COMMENT
2
Entering edit mode

normally, you don't consider doing this unless you have or expect inflation (most likely) due to population stratification

did you check for this or have another reason for including?

ADD REPLY
0
Entering edit mode

thanks for the response...lets say there is population stratification and inflation- is tinkering with pca values ok in this situation (and if so, should I check the post-tinkered-with values in some way to assure that they are still meaningful representations of the original values). This is perhaps expanding the discussion to 'what makes a good covariate' but it still seems less clear to me that this sort of tinkering would be wise for pca values as opposed to, say height or weight

ADD REPLY
0
Entering edit mode

What do you mean by " standardize a principle component"?

ADD REPLY
1
Entering edit mode

a z-transformation of the pca values (calculated by someone else, perhaps using eigenstrat, but i'm not sure, I simply see them as a numerical field alongside other fields like 'height' and 'weight' ) with one quirk: the mean and standard deviation for the z-transformed values are only calculated with a subset of the individuals used to calculate the pca values originally. Here is what's happening: PCA values are calculated upstream by someone -> I get the data -> I perform association testing on a subset of individuals using regression and I input pca values as covariates -> I notice the tool I am using is transforming all covariates to z-scores before inputting them into the regression

ADD REPLY
1
Entering edit mode

If you are working on a subset, check the inflation factor with and without PCA. (Plink's --assoc --adjust option will give you inflation factor in a log file). If it doesn't help then you can drop PCA, or recalculate it for your subset of data.

ADD REPLY
1
Entering edit mode
11.1 years ago
Bioch'Ti ★ 1.1k

Hi, If your association panel is not structured (your PCA results should tell you), then only the kinship matrix (K) should be included in the GLM/MLM model to perform the GWA. Otherwise, you have to take into account the population structure as covariate whatever the method you used to evaluate the structure (PCA, Q, MDS, ...). See: http://www.nature.com/nrg/journal/v11/n7/abs/nrg2813.html Best

ADD COMMENT
0
Entering edit mode

Thanks for the reference, it looks like the genomic control statistic is a good benchmark for evaluating my model, i.e it doesn't really matter how I get the covariates/predictors in my model, as long as GC~=1

ADD REPLY

Login before adding your answer.

Traffic: 2625 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6