Hi there. I'm using adegenet in R to explore genetic structure in a fairly large set of populations with dapc. The thing is, that our sample size for some of those populations is quite small (1, 2 or 3 individuals, for example), while other populations can reach more than 20. Should I use all populations to find the number of clusters with "find.clusters()" or is there a minimum sample size per population that I should have into account?
I would use all of them for finding the clusters, makes sense to me, but if so, some of these populations (n=1) wil not be used in the cross-validation, as a training and validation set have to be made. Will this make the number of PCs chosen with xvalDapc unreliable for the whole dataset?
Thanks in advance.