Rationale for including a certain number of PCs in a GWAS study?
1
5
Entering edit mode
10.0 years ago
LauferVA 4.5k

I have seen papers that have chosen to include 1,2,3,4,5,8, and 20 PCs as covariates in GWA studies. I have probably seen others that include a different number but was not aware of it at the time.

Some papers appear to do this based on the appearance of Scree plots, but more commonly no explanation is provided as to why a certain number were included.

Certain papers out of the Broad institute include as many as 20PCs, and do not provide a rationale for why.

My questions are:

1. What are the various rationales for including a certain number of PCs into a GWAS study?

2. Is the inclusion of more PCs regarded as conservative or anti-conservative? If so, why?

3. Are there good papers that explain this in a rigorous fashion in the context of genetic studies, controlling for ethnicity etc.?

PCA Statistics GWAS • 5.7k views
ADD COMMENT
4
Entering edit mode
10.0 years ago
Sam ★ 4.8k

The reason of including a certain number of PCs as the co-variate of the GWAS association test is to adjust for the population stratification. Most of the time, you can look at the PCA scatter plots and see which PCs are best at separating your samples based on the population, so for example, in the first picture in this post, the samples seems to be separated in both PCs so you would likely include both PC1 and 2 as co-variate to account for the population difference. Another way will be to look at the Scree plots which will tells you the amount of variance explained each PCs. Some might usually choose just enough PCs to account for 80~90% of the variation explained.

I think by including more PCs, you will adjust for more and more subtle difference between the samples (first few PCs should usually adjusted for most of the difference in population), so by including more, you might risk overfitting?

For more details, you might want to read the EIGENSTRAT paper, considering it is one of the main tool that first perform such analysis

ADD COMMENT

Login before adding your answer.

Traffic: 3784 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6