Pheatmap cluster number labels
2
0
Entering edit mode
8.4 years ago
thomasterTW ▴ 20

My question is very simple. When I use the clustering option in pheatmap, for example:

res<-pheatmap(matrix, scale="row", cluster_cols=FALSE, show_rownames=FALSE, cutree_rows=8, color=colorRampPalette(c("red","black","green"))(50), cellwidth=30, cellheight=0.4, main="C2C7_Differential_Gene_Expression", fontsize=4,filename="C2C7_Differential_Gene_Expression8clusters.pdf")

I get a cool heatmap image with 8 differentiated clustes. These clusters however are not labeled. After writing the data to a file:

b<-data.frame(cutree(res$tree_row, k=8))
write.table(as.data.frame(b), file="C2C7_Differential_Gene_Expression8clusters.txt", sep="\t")

I assumed cluster 1 in this file should be the top cluster in the image. This turns out to not be the case. I think the cluster numbers in the file are ordered based on the tree branching.

As I do not want to waste time figuring out which cluster number corresponds to which cluster in the image, I wonder: Is there an option in pheatmap to label the clusters in the image?

RNA-Seq pheatmap clustering • 13k views
ADD COMMENT
4
Entering edit mode
7.0 years ago
sisterdot ▴ 40

you approach was very close to what i think you need to do (at least using pheatmap_1.0.8). the last step will give you the cluster assignment in the right order.

library(pheatmap)
k.fix=4
mat = matrix(rnorm(200), 20, 10)
res<-pheatmap(mat, cutree_rows=k.fix,show_rownames = T)
cutree(res$tree_row, k=k.fix)[res$tree_row[["order"]]]

pheatmap seems to use the setting of cutree_rows only for determining gaps

if (!is.na(cutree_rows)) {
     gaps_row = find_gaps(tree_row, cutree_rows)
}

and the find_gaps function does

    find_gaps = function(tree, cutree_n){
        v = cutree(tree, cutree_n)[tree$order]
        gaps = which((v[-1] - v[-length(v)]) != 0)
}

it does not appear as if the cluster assignment can be extracted from what the pheatmap function returns.

invisible(list(tree_row = tree_row, tree_col = tree_col,  kmeans = km, gtable = gt))
ADD COMMENT
0
Entering edit mode

and for the complete ordered matrix with cluster info:

mat.order <- cbind(mat[c(res$tree_row[["order"]]),res$tree_col[["order"]]],cluster=cutree(res$tree_row, k=k.fix)[res$tree_row[["order"]]])
ADD REPLY
1
Entering edit mode
8.4 years ago
EagleEye 7.6k

To get the data in the exact order as in heatmap use this,

A: To extract row names of plotted heatmap

ADD COMMENT
0
Entering edit mode

Thanks! That's what I need. However, when I try this method I get an error.

res2<-matrix[c(res$tree_row[["order"])]]
Error in `[.data.frame`(matrix, c(res$tree_row[["order"]])) :
  undefined columns selected
ADD REPLY
0
Entering edit mode

I already fixed the problem. Should be res2<-matrix[c(res$tree_row[["order"]]),]

ADD REPLY
0
Entering edit mode

I still can't get it to work :(

This it the output of pheatmap:

> res
$tree_row

Call:
hclust(d = d, method = method)

Cluster method   : complete
Distance         : euclidean
Number of objects: 6445

$tree_col
[1] NA

$kmeans
[1] NA

$gtable
TableGrob (5 x 6) "layout": 5 grobs
  z     cells      name                        grob
1 1 (1-1,3-3)      main         text[GRID.text.179]
2 2 (4-4,1-1)  row_tree polyline[GRID.polyline.180]
3 3 (4-4,3-3)    matrix       gTree[GRID.gTree.182]
4 4 (5-5,3-3) col_names         text[GRID.text.183]
5 5 (4-5,5-5)    legend       gTree[GRID.gTree.186]

When applying my original method ( b <- data.frame(cutree(res$tree_row, k=5))) the output looks like this

> head(b)
                   cutree.res.tree_row..k...5.
ENSMUSG00000000001                           1
ENSMUSG00000000028                           2
ENSMUSG00000000031                           3
ENSMUSG00000000056                           3
ENSMUSG00000000058                           1
ENSMUSG00000000078                           1

With the suggested method (res2<-matrix[c(res$tree_row[["order"]]),]) the output looks like this

> head(res2)
                      FPKM_1b    FPKM_2b   FPKM_3b   FPKM_4b   FPKM_5b   FPKM_6b
ENSMUSG00000043257  0.3025560  0.4250808 0.4468834 0.5655751 0.5719265 0.4285203
ENSMUSG00000020283  0.9988489  1.0104823 1.1025160 1.2239578 1.1445088 1.0986714
ENSMUSG00000028469 -0.1916256 -0.1084386 0.1324678 0.3324304 0.2095445 0.0737440
ENSMUSG00000021939  1.5766039  1.5866526 1.6339802 1.6939370 1.6958720 1.6544634
ENSMUSG00000068040  1.2922228  1.2956221 1.3249447 1.3735446 1.3566950 1.3483341
ENSMUSG00000062611  2.6689941  2.7180206 2.6920966 2.7256585 2.7129930 2.7073197

What do I need to do to get the output in the way of my original method? I only need the gene ID and in which cluster it is located.

ADD REPLY

Login before adding your answer.

Traffic: 1835 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6