Hi all,
I am wondering what is the reason for including principal components as covariates in QTL analysis? And how to determine the number of PCs to include? For example, the following is a short text from a paper. I understand that by including imputation status, we can adjust for potential biases of imputation. But what do PCs eliminate? Thank you in advance!
The details of sample sets, data filtering and normalization are discussed above. Briefly, we did transcriptome QTL mapping separately for European (n=373) and Yoruba (n=89) populations. We used genetic variants with MAF>5% in either EUR or YRI <1MB from transcription start site, with covariates of imputation status (0|1), PCs 1-3 for Europeans and PCs 1-2 for Yoruba.
Check out this paper: "Principal components analysis corrects for stratification in genome-wide association studies."
Thanks for the paper!