optimal number of clusters from distance matrix input
1
0
Entering edit mode
4.1 years ago
lagartija ▴ 160

Hi ! I am trying to cluster data on R from a distance matrix. I tried hclust (and cuttree) or pam. The problem is, I don't know how many clusters to ask for. I know a lot of methods are found to determine the optimal number of clusters and clustering method (mclust, pvclust, fviz_nbclust, optCluster) but starting from a quantitative data. In my case I want to do the same but starting from a distance matrix. As these methods I cited go through a distance matrix it would sound weird to do a distance of a distance. Do you know other methods ?

Cheers !

clustering R • 1.3k views
ADD COMMENT
1
Entering edit mode
4.1 years ago

There is no correct or incorrect answer. What you need to think about is 'resolution'. If I stood at the edge of the Via Láctea / Milky Way and looked toward the Supermassive Black Hole at the center [of the galaxy], I would see hundreds of millions of points of light emanating from stars all across the galaxy. How to cluster these? The logical way is to just first identify a cluster center and then set some distance metric (say 50 light years) within which I would define a cluster. So, the Sun could be a cluster center, with every other star within 50 light years forming a single cluster.

In summary, there's nothing incorrect about choosing a single distance value and using that to define your clusters.

The real concern for me is why you just have the distance matrix and nothing else, which may reflect some underlying disorganisation in your group.

ADD COMMENT
0
Entering edit mode

Thank you, I think I agree. I only have a distance matrix because I am comparing images pairwise and the first step is to output a distance.

ADD REPLY

Login before adding your answer.

Traffic: 2518 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6