I am running Admixture analyses. In order to recapitulate some IACs from the literature, I took the intersection of the larger dataset I am working with, which is based off the Affymetrix Human Origins array, with some data from populations-of-interest, which were based off the Illumina Omni 1M chip. I had about 140,000 SNPs after merging and then about 111,000 or 120,000 after pruning (--indep-pairwise 200 25 0.4
or --indep-pairwise 50 5 0.5
)
When I am just using Affymetrix Human Origins data (I have around ~280,000 SNPs after pruning), and I get CV errors minimuming or plateauing around of 0.33 or 0.35-6.
With the overlap dataset, my CV errors are much larger. For example, with the 50 5 0.5 pruning method, here are my CVs
CV error (K=1): 0.58226
CV error (K=2): 0.54319
CV error (K=3): 0.53868
CV error (K=4): 0.53628
CV error (K=5): 0.53454
CV error (K=6): 0.53349
CV error (K=7): 0.53230
CV error (K=8): 0.53179
CV error (K=9): 0.53115
CV error (K=10): 0.53091
CV error (K=11): 0.53074
CV error (K=12): 0.53059
CV error (K=13): 0.53086
CV error (K=14): 0.53057
CV error (K=15): 0.53094
CV error (K=16): 0.53102
CV error (K=17): 0.53136
CV error (K=18): 0.53161
CV error (K=19): 0.53186
CV error (K=20): 0.53243
I am wondering why are these so high compared to the original dataset. Are these too high, or is this reasonable?
Thanks!