pheatmap vs. manual hclust with different results
1
1
Entering edit mode
7.7 years ago
sdominguez ▴ 10

I am stuck in a problem with hierarchical clustering. I want to make a dendrogram and a heatmap, with a distance method of correlation (d_mydata=dist(1-cor(t(mydata))) and ward.D2 as clustering method.

As a gadget in the package pheatmap you can plot the dendrogram on the left side to visualize the clusters.

The pipeline of my analysis would be this:

create the dendrogram test how many cluster would be the optimal (k) extract the subjects in each cluster create a heatmap My surprise comes up when the dendrogram plotted in the heatmap is not the same as the one plotted before even when methods are the same.

So I decided to create a pheatmap colouring by the clusters classified before by cutree and test if the colours correspond to the clusters in the dendrogram.

This is my code:

Create test matrix

test = matrix(rnorm(200), 20, 10)
test[1:10, seq(1, 10, 2)] = test[1:10, seq(1, 10, 2)] + 3
test[11:20, seq(2, 10, 2)] = test[11:20, seq(2, 10, 2)] + 2
test[15:20, seq(2, 10, 2)] = test[15:20, seq(2, 10, 2)] + 4
colnames(test) = paste("Test", 1:10, sep = "")
rownames(test) = paste("Gene", 1:20, sep = "")
test<-as.data.frame(test)

Create a dendrogram with this test matrix

dist_test<-dist(test) hc=hclust(dist_test, method="ward.D2")

plot(hc)

dend<-as.dendrogram(hc, check=F, nodePar=list(cex = .000007),leaflab="none", cex.main=3, axes=F, adjust=F)

clus2 <- as.factor(cutree(hc, k=2)) # cut tree into 2 clusters groups<-data.frame(clus2) groups$id<-rownames(groups)

-----------DATAFRAME WITH mydata AND THE CLASSIFICATION OF CLUSTERS AS FACTORS---------------------

test$id<-rownames(test) clusters<-merge(groups, test, by.x="id") rownames(clusters)<-clusters$id

clusters$clus2<-as.character(clusters$clus2) clusters$clus2[clusters$clus2== "1"]= "cluster1" clusters$clus2[clusters$clus2=="2"]<-"cluster2"

plot(dend, main = "test", horiz = TRUE, leaflab = "none")

d_clusters<-dist(1-cor(t(clusters[,7:10]))) hc_cl=hclust(d_clusters, method="ward.D2")

annotation_col = data.frame( Path = factor(colnames(clusters[3:12])) ) rownames(annotation_col) = colnames(clusters[3:12])

annotation_row = data.frame( Group = factor(clusters$clus2) ) rownames(annotation_row) = rownames(clusters)

Specify colors

ann_colors = list( Path= c(Test1="darkseagreen", Test2="lavenderblush2", Test3="lightcyan3", Test4="mediumpurple", Test5="red", Test6="blue", Test7="brown", Test8="pink", Test9="black", Test10="grey"), Group = c(cluster1="yellow", cluster2="blue") )

require(RColorBrewer) library(RColorBrewer) cols <- colorRampPalette(brewer.pal(10, "RdYlBu"))(20) library(pheatmap) pheatmap(clusters[ ,3:12], color = rev(cols), scale = "column", kmeans_k = NA, show_rownames = F, show_colnames = T, main = "Heatmap CK14, CK5/6, GATA3 and FOXA1 n=492 SCALE", clustering_method = "ward.D2", cluster_rows = TRUE, cluster_cols = TRUE, clustering_distance_rows = "correlation", clustering_distance_cols = "correlation", annotation_row = annotation_row, annotation_col = annotation_col,
annotation_colors=ann_colors )

R cluster clustering pheatmap • 12k views
ADD COMMENT
0
Entering edit mode

you are not scaling your data when you do hclust(dist(data)). But in pheatmap, you scale your data based on column ?

In pheatmap help section it says it uses hclust therefore, I think your error was caused by not giving the same input. pheatmap also have distance matrix output so check; 1) if your distance matrix == pheatmaps 2) make sure you scale your data as well in nonpheatmap way.

ADD REPLY
0
Entering edit mode

I assume you have to change

dist_test<-dist(test)

with something like

dist_test<-as.dist((1 - cor(test))/2)

to use correlation distance.

ADD REPLY
1
Entering edit mode
7.7 years ago
igor 13k

I had a related issue before. You may find this thread helpful: Clustering differences between heatmap.2 and pheatmap

ADD COMMENT

Login before adding your answer.

Traffic: 2380 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6