Hello
I'm reading a paper but I have some difficulties with interpreting a principal component plot. I'm not familiar with principal component analysis so I looked up some information to understand better what it does. In the paper, they say "We performed a WGS-based genome-wide association study (GWAS) using a logistic model with principal component correction to account for any remaining population stratification after restriction to individuals with > 95% European ancestry, though inspection of the principal component plots demonstrates the cohorts are well balanced". So the two colors represent two different cohorts which are compared. I read in another paper that the principal component 1 axis reflects variation between two populations which have a different geographical location. But which variation does the principal component 2 axis reflect? And so because these red dots and blue dots are equally spread, they conclude that the cohorts are balanced? Because if the red dots were on one side of the principal component 1 axis and the blue dots on the other side than the differences in allele frequencies could be due to the difference in geographical location of this two cohorts? Am I interpreting this right or not?