Hi all,
I have the following distance matrix which is a direct output of some pairwise sequence metric.
> as.dist(df)
0 1 2 3 4 5 6 7 8
1 1.9356
2 1.6758 2.8880
3 1.9664 1.0587 2.4737
4 2.1619 1.2724 2.5110 1.1447
5 1.8347 1.0197 2.1482 1.1709 1.2174
6 1.9889 1.0422 2.4029 1.0205 0.3976 1.0199
7 0.8700 2.3598 1.4906 1.8574 2.8255 2.4992 2.2814
8 1.6657 0.5076 2.8697 1.1120 1.3185 1.0617 1.1108 1.9752
9 1.7172 3.7109 1.9279 3.8676 2.3161 2.1345 2.1262 1.6730 2.7601
I would like to cluster these from the minimum distance to the longest. What I would expect to see is, for instance, 4 and 6 clustered together under a node with 0.3976 as their distance, 1 and 8 with 0.5076 and 0 and 7 with 0.8700. Then, 3 which has a minimum distance to 6 will need to cluster with (4,6).
Although a hierarchical clustering seems to work, I could not define linkage or distance methods which give the most similar output to what I expect. The distance of the outputs often look inflated and does not properly represent the metric.
> hc <- hcluster(as.dist(df), link = "single")
> hc$height
[1] 0.916764 1.285251 1.674859 1.707090 1.759598 1.937444 2.617993 3.196162 3.667614
Would you guys have any suggestions?
I found out I was supposed to use hclust instead of hcluster :(
So, all is okay now?