How To Get The Clear Values At The Bottom Of A Dendrogram In Clustering In R?
3
2
Entering edit mode
12.4 years ago
grosy ▴ 100

Dear Friends,

I have huge number of data to cluster in R. But when i try to cluster, all the numbers at the bottom of the dendrogram merges which is very difficult to interpret the values.

clustered data with merged values at the bottom

http://www.freeimagehosting.net/ryiu3

Could anyone please help me to get rid of this problem to get better visualization of the values at the bottom of the dendrogram in R.

the code

 a <- read.csv("C:\\file.csv", header = TRUE)
 b <-scale(a)
 c <- cor(t(b), method="spearman");
 d <- as.dist(1-c); 
 hr <- hclust(d, method = "complete", members=NULL)
 par(mfrow = c(2, 2)); plot(hr, hang = 0.1); plot(hr, hang = -1)
r programming clustering • 13k views
ADD COMMENT
0
Entering edit mode

Could you give us the code you are using and the output it creates so that we can visualise what the problem is?

ADD REPLY
3
Entering edit mode
12.4 years ago
kstamm ▴ 50

With so many values your options are to either draw an enormous picture (as in Michael Dondrop's answer) or to skip the picture and use some textual output.

If you pass an argument to the hclust function it can retain the tree data structure and let you have code-access to it. The tree datastructure is a list of left and right elements, each of which has a height parameter and another set of left and right elements. You have to traverse the list with some kind of loop to get at the subclusters. There also exists a function to retrieve all leaf nodes, so you at least will know their order.

Given a height cutoff threshold you could separate this into a reasonable number of subtrees and maybe draw those separately.

I don't have the code available here, but the principle is straightforward. Ask hclust to return the dendrogram and you can dig through it.

At the integrated R help ?hclust there is an example of getting at the ten largest subtrees like so:

hc <- hclust(dist(USArrests)^2, "cen")  
memb <- cutree(hc, k = 10)
ADD COMMENT
2
Entering edit mode
12.4 years ago
Michael 55k

Try producing an SVG file and open it with your web browser or graphics program using the svg() function, requires Cairo:

svg(width=50)
plot(hr, hang=-1, cex=0.5)
dev.off()

This will give non-overlapping labels and looks ok-ish for about 500 data points, for more or less or other width, you can experiment with the values.

ADD COMMENT
0
Entering edit mode

sorry i m new to R. could u please tell me how generate a svg file from my file in csv format?

ADD REPLY
0
Entering edit mode

its showing

null device 1

ADD REPLY
0
Entering edit mode

It has already generated it. The file would be named Rplot001.svg in your documents.

ADD REPLY
1
Entering edit mode
12.4 years ago

You are going to have to play with the plot() options to build a larger plot with smaller legends. For instance, have you tried using the 'cex' option in the plot() call?

Also, to help you, the labels on the figure can be retrieved this way: hc$label[hc$order]

ADD COMMENT

Login before adding your answer.

Traffic: 1323 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6