Hi all,
I have a question concerning the methods to perform a Principal Component Analysis of genotype matrix (genotypes coded as 0,1,2) to study the structure of a population.
I have been exposed to two different ways of performing this analsys, and I don't understand the differences between the them.
1st method) Eigenvector of the Individual-correlation matrix
The genotype matrix (M=individuals X N=snps) is used to calculate a correlation matrix by individuals (MxM). Then, the eigenvector of this matrix is calculated. These eigenvectors are used to describe the population structure.
2nd method) Multiplying the Genotype Matrix by the eigenvectors of the SNP-correlation matrix
The genotype matrix is used to calculate a correlation matrix by SNPs (NxN). Then, the eigenvector matrix of this matrix is calculated. The eigenvectors describes how the correlation between SNPs is structured. Then, I multiply the genotype matrix (scaled and centered by SNP) by the eigenvectors. The multiplication is therefore between a (MxN) matrix and a (NxN) and will produce a NxM matrix. Each eigenvectors gives a specific weight to the SNPs, and these weights are multiplied for the genotype of each individual. The results of this multiplication will be used to describe the population structure.
The second method is computationally intensive (correlation matrix NxN can be super heavy to compute) and the R-packages to perform this analysis use the first method. Is there a difference in terms of outcomes? Can you explain them to me?
Thank you
OS