SNP-pruning is a necessary step for ancestry estimations (using PCA) as part of genome-wide association study quality control. Typically this is done using PLINK or Hail by filtering the variants based on pairwise linkage disequilibrium (R^2) between SNPs in a given window.
However, enough genomes have been sequenced now that all common haplotypes and SNPs are known (in humans)
So the question is why is this pairwise calculation still done? Isn't there a reference set of independent SNPs that can just be used for PCA? This would be much simpler than performing this pairwise calculation every time
If there is such a set, could someone point me to it please!