I have genome-wide association data on ~11k samples from 4 distinct ethnic groups. I've computed principal components for everyone using an Eigenstrat-like procedure.
But if I want to do an ethnic-specific analysis, can I used those same PCs for a subset of those 11k samples to correct for within-ethnic-group population stratification?
My gut says to recompute the PCs separately in each subset, but now that I'm thinking about it, what exactly could go wrong by doing it this way? Would test statistics be biased?
I would suggest asking here: http://stats.stackexchange.com/