Entering edit mode
6.6 years ago
User000
▴
710
Dear all,
I am using find.clusters and DAPC to my SNPs data. I am interested in K 2-20. However, the clustering results are different whenever I re-run the code on the same dataset, I guess this is due to k-means algorithm that find.clusters is using. Do you know it if possible to find an optimum center, to get some stable results or how to improve it?
grp <- find.clusters(obj1, n.clust = 20, n.pca = 500, stat = "BIC", n.iter = 100000, n.start = 1000)
dapcc <- dapc(obj1, grp$grp, n.pca = 50, n.da =7)
How different are the clusters ? Do they fluctuate a lot at each run ?
To optimize your clustering, i would run, say 50 times,
find.clusters
function and get the optimal number using yourstat= "BIC"
which is the statistical measure of goodness of fit. And to reproduce your results useset.seed(x)
.No, I have two different versions, and I am interested in one of them, the problem is I want consistent results for all runs from K 2 to K 20. I already know that I want to stop at K 20, and not go further. So I literally have to re-run the same command line 50 times? Isn't n.start = 1000 doing this? Could you please explain better? also set.seed(x), which number should I use? the one that showed better result? Thank you a lot!
update: for n.clust = 20 I did set.seed(20), and it worked 3 times, for n.clust = 19, should I use also set.seed(20)?
For each run use different set.seed() it could be any integer number. So if you set to 20 you will always get same results if you run same code. Just use different set.seeds.
yes, but I want to get the consistent results, similar to admixture, like from K = 2 to K = 20 I want to see consistent results of species separating at the end in 20 different clusters...I hope I could explain myself.
To run K = 2 to 20 you should use max.n.clust instead of n.clust, otherwise you are only runing k-means once.
So, if I run max.n.clust = 20, how can I get the membership values then for K 2-20?
When running find.clusters the optimal number of clusters to retain is assessed based on BIC value and stored into grp same way as before.