Hi all,
I am trying to decide how/whether to incorporate principle components into association tests in a gwas. My question is: Is it unwise to standardize a principle component (calculated from snp data) that will be used as a covariate in a linear regression (I suspect this is not ok)? I realize this is a somewhat vague question, although I'm hoping someone who has performed gwas with principal components as covariates will have some insight, thanks.
normally, you don't consider doing this unless you have or expect inflation (most likely) due to population stratification
did you check for this or have another reason for including?
thanks for the response...lets say there is population stratification and inflation- is tinkering with pca values ok in this situation (and if so, should I check the post-tinkered-with values in some way to assure that they are still meaningful representations of the original values). This is perhaps expanding the discussion to 'what makes a good covariate' but it still seems less clear to me that this sort of tinkering would be wise for pca values as opposed to, say height or weight
What do you mean by " standardize a principle component"?
a z-transformation of the pca values (calculated by someone else, perhaps using eigenstrat, but i'm not sure, I simply see them as a numerical field alongside other fields like 'height' and 'weight' ) with one quirk: the mean and standard deviation for the z-transformed values are only calculated with a subset of the individuals used to calculate the pca values originally. Here is what's happening: PCA values are calculated upstream by someone -> I get the data -> I perform association testing on a subset of individuals using regression and I input pca values as covariates -> I notice the tool I am using is transforming all covariates to z-scores before inputting them into the regression
If you are working on a subset, check the inflation factor with and without PCA. (Plink's
--assoc --adjust
option will give you inflation factor in a log file). If it doesn't help then you can drop PCA, or recalculate it for your subset of data.