Question

Eigenstrat: Should We Multiply Eigenvector By Sqrt(Eigenvalue) To Form The Pc Axis?

1

Entering edit mode

12.3 years ago

nnlnn ▴ 60

As implied by the paper behind the software, i.e. Price et al. (2006), one would directly use eigenvectors ("ancestries of individuals") from EIGENSTRAT as covariates in subsequent linear or logistic regression. However, these eigenvectors are orthonormal, meaning that they all have the same variance. In other words, variation along each axis (eigenvector) is the same, which is not the way it should be. The variation along an axis should be proportional to its associated eigenvalue (lambda). So I think the correct thing is multiply eigenvec_k by the square root of lambda_k, and feed it in a regression model as a covariate. On the other hand, it can be shown that eigenvec_k * sqrt(lambda_k) is just the kth score vector for the individuals if one runs PCA on genotype matrix of size nxp, rather than its transpose, pxn, (n = sample size; p = number of SNPs); the latter is what is used in Price paper.

Although the whole point of performing eigenstrat is to adjust for structure when testing SNP's effect and hence the significance of a SNP is independent of multiplication of sqrt(lambda) mentioned above, I think we need to use the right PC axes. I would be very grateful to any corrections and comment on this topic.

pca • 5.3k views

ADD COMMENT • link updated 12.3 years ago by Hypotheses ▴ 90 • written 12.3 years ago by nnlnn ▴ 60

score 0 · Answer 1 · 2012-07-26

Not sure if I am quite understand your question, but to calculate the score for each individual *i* you do something along this line eigenvec_k' × GENOTYPE_i. And, it is this individual specific score that you would use to adjust for population structure, isn't it? Or, do I mis-understand something?

My understanding of lambda_k is that this the variation explain by the k_th principal component, and that's pretty much what the eigenvalues are describing.