For a while, I've been using the pvclust
package to geenrate dendrograms of samples from microarray data. Recently I got a large data set, divided into 4 classes, and since the sample number is relatively high I need to color the various dendrogram leaves (or at least the labels), depending on the original class (e.g., class 1 would get green leaves/labels).
Notice that the original data file is about 19K lines long over more than 40 samples.
I obtain my dendrogram as
require(pvclust)
mydata <- read.delim("mydata.txt", row.names=1)
library(parallel)
cl <- makeForkCluster(8)
pvclust_result <- parPvclust(cl=cl, data=as.matrix(mydata), nboot=100)
result_dendrogram <- as.dendrogram(pvclust_result$hclust)
classes <- factor(additional_data$class, levels=c("Class1", "Class2", "Class3", "Class4"))
(additional_data
is a data.frame with the required information)
The problem is, I've been trying some solutions but neither of these seem to work. Heatplus
from the Bioconductor repositories tries to cluster the data despite providing a dendrogram (and it's too slow), the example in the R gallery does not work as it colors the dendrogram groups, and not the original classes.
What is the easiest way to color either dendrogram leaves or just labels if needed according to the factors? (I have a mapping of label name - class that I can use)
Thanks in advance.
EDIT:
This is the version that I used, based on David Quigley's answer:
plot.cluster.colors=function( M, D, colors, force.flat=F, show.labels=T, label.cex=1, ... ){
lookup = data.frame( leaf.name = names(M), color=colors, stringsAsFactors=F )
if(force.flat)
hangval = -1
else
hangval = 0.1
dend_colored = dendrapply(D, color.dendrogram.labels, label.cex=label.cex, show.labels=show.labels, color.lookup=lookup)
plot(dend_colored, ...)
}
color.dendrogram.labels = function(n, label.cex=1, show.labels=TRUE, color.lookup=NULL){
if(is.leaf(n)){
a = attributes(n)
color = color.lookup$color[ which(color.lookup$leaf.name==a$label)[1] ]
if( show.labels ){
cex.val = label.cex
lab.color = color
}
else
{
cex.val=0.01
lab.color="white"
}
attr(n, "nodePar") = c(a$nodePar, list(lab.col = lab.color, lab.cex=cex.val, col=color, pch=15, cex=1 ) )
}
n
}
Thank you, it works! I'll be adjusting this but this is exactly what I needed.
Great. I'll modify my own quick-and-dirty code at some point to use the function parameters correctly.