clustering from the minimum distance
0
0
Entering edit mode
5.5 years ago

Hi all,

I have the following distance matrix which is a direct output of some pairwise sequence metric.

> as.dist(df)
       0      1      2      3      4      5      6      7      8
1 1.9356                                                        
2 1.6758 2.8880                                                 
3 1.9664 1.0587 2.4737                                          
4 2.1619 1.2724 2.5110 1.1447                                   
5 1.8347 1.0197 2.1482 1.1709 1.2174                            
6 1.9889 1.0422 2.4029 1.0205 0.3976 1.0199                     
7 0.8700 2.3598 1.4906 1.8574 2.8255 2.4992 2.2814              
8 1.6657 0.5076 2.8697 1.1120 1.3185 1.0617 1.1108 1.9752       
9 1.7172 3.7109 1.9279 3.8676 2.3161 2.1345 2.1262 1.6730 2.7601

I would like to cluster these from the minimum distance to the longest. What I would expect to see is, for instance, 4 and 6 clustered together under a node with 0.3976 as their distance, 1 and 8 with 0.5076 and 0 and 7 with 0.8700. Then, 3 which has a minimum distance to 6 will need to cluster with (4,6).

Although a hierarchical clustering seems to work, I could not define linkage or distance methods which give the most similar output to what I expect. The distance of the outputs often look inflated and does not properly represent the metric.

> hc <- hcluster(as.dist(df), link = "single")
> hc$height
[1] 0.916764 1.285251 1.674859 1.707090 1.759598 1.937444 2.617993 3.196162 3.667614

Would you guys have any suggestions?

clustering R • 709 views
ADD COMMENT
0
Entering edit mode

I found out I was supposed to use hclust instead of hcluster :(

ADD REPLY
0
Entering edit mode

So, all is okay now?

ADD REPLY

Login before adding your answer.

Traffic: 1619 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6