Question

PLINK Principal Components not adequately controlling for population stratification in linear regression?

1

Entering edit mode

9.4 years ago

dam4l ▴ 200

I'm doing a GWAS using ~15 million variants and ~800 people. I am unfamiliar with Linux, so I have tried using PLINK MDS and PCA functions to obtain principal components to be used as covariates in the association analysis to control for population stratification. When I plotted the p-values (QQ plot) obtained from the association analysis, the distribution was pretty messy, suggesting that I did not adequately control for population stratification. I took the following steps:

Pruned based on LD using PLINK --indep
Created a genome file:

./plink --bfile file --genome --extract plink.prune.in
Used --pca to generate an eigenvec file containing PCs

./plink --bfile gendep_merged --cluster --pca header --extract plink.prune.in --read-genome plink.genome
Performed the association analysis using 10 PCs from the eigenvec file as covariates:

./plink --bfile file --pheno phenotype.txt --allow-no-sex --covar plink.eigenvec --covar-name PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10 --out association --linear --adjust

Am I missing a step or should any of the flags used by modified in order to produce PCs that will adequately control for population stratification in this sample?

Any input would be greatly appreciated.

plink SNP gwas pca population stratification • 7.6k views

ADD COMMENT • link updated 9.4 years ago by andrew.j.skelton73 6.6k • written 9.4 years ago by dam4l ▴ 200

score 1 · Accepted Answer · 2016-03-16

1

Entering edit mode

9.4 years ago

andrew.j.skelton73 6.6k

How exactly is using the first ten principle components controlling for "population stratification"? If I understand correctly, you're performing an association test, and telling the model fit to smooth out the ten biggest drivers of variance in your dataset? When you checked the principle components, did they indicate that the first ten explained the difference in population? Could you be smoothing out the effect you're testing for instead?

ADD COMMENT • link 9.4 years ago by andrew.j.skelton73 6.6k

1

Entering edit mode

Using 10 does indeed seem a bit excessive. You should only use the PCs that actually stratify your population. If that's none of them, then do not include any.

ADD REPLY • link 6.9 years ago by Kevin Blighe 89k