Question

Pheatmap cluster number labels

0

Entering edit mode

8.4 years ago

thomasterTW ▴ 20

My question is very simple. When I use the clustering option in pheatmap, for example:

res<-pheatmap(matrix, scale="row", cluster_cols=FALSE, show_rownames=FALSE, cutree_rows=8, color=colorRampPalette(c("red","black","green"))(50), cellwidth=30, cellheight=0.4, main="C2C7_Differential_Gene_Expression", fontsize=4,filename="C2C7_Differential_Gene_Expression8clusters.pdf")

I get a cool heatmap image with 8 differentiated clustes. These clusters however are not labeled. After writing the data to a file:

b<-data.frame(cutree(res$tree_row, k=8))
write.table(as.data.frame(b), file="C2C7_Differential_Gene_Expression8clusters.txt", sep="\t")

I assumed cluster 1 in this file should be the top cluster in the image. This turns out to not be the case. I think the cluster numbers in the file are ordered based on the tree branching.

As I do not want to waste time figuring out which cluster number corresponds to which cluster in the image, I wonder: Is there an option in pheatmap to label the clusters in the image?

RNA-Seq pheatmap clustering • 13k views

ADD COMMENT • link updated 7.0 years ago by sisterdot ▴ 40 • written 8.4 years ago by thomasterTW ▴ 20

score 4 · Answer 1 · 2017-12-15

you approach was very close to what i think you need to do (at least using pheatmap_1.0.8). the last step will give you the cluster assignment in the right order.

library(pheatmap)
k.fix=4
mat = matrix(rnorm(200), 20, 10)
res<-pheatmap(mat, cutree_rows=k.fix,show_rownames = T)
cutree(res$tree_row, k=k.fix)[res$tree_row[["order"]]]

pheatmap seems to use the setting of cutree_rows only for determining gaps

if (!is.na(cutree_rows)) {
     gaps_row = find_gaps(tree_row, cutree_rows)
}

and the find_gaps function does

    find_gaps = function(tree, cutree_n){
        v = cutree(tree, cutree_n)[tree$order]
        gaps = which((v[-1] - v[-length(v)]) != 0)
}

it does not appear as if the cluster assignment can be extracted from what the pheatmap function returns.

invisible(list(tree_row = tree_row, tree_col = tree_col,  kmeans = km, gtable = gt))

score 1 · Answer 2 · 2016-06-30

1

Entering edit mode

8.4 years ago

EagleEye 7.6k

To get the data in the exact order as in heatmap use this,

A: To extract row names of plotted heatmap

ADD COMMENT • link 8.4 years ago by EagleEye 7.6k

0

Entering edit mode

Thanks! That's what I need. However, when I try this method I get an error.

res2<-matrix[c(res$tree_row[["order"])]]
Error in `[.data.frame`(matrix, c(res$tree_row[["order"]])) :
  undefined columns selected

ADD REPLY • link 8.4 years ago by thomasterTW ▴ 20

0

Entering edit mode

I already fixed the problem. Should be res2<-matrix[c(res$tree_row[["order"]]),]

ADD REPLY • link 8.4 years ago by thomasterTW ▴ 20

0

Entering edit mode

I still can't get it to work :(

This it the output of pheatmap:

> res
$tree_row

Call:
hclust(d = d, method = method)

Cluster method   : complete
Distance         : euclidean
Number of objects: 6445

$tree_col
[1] NA

$kmeans
[1] NA

$gtable
TableGrob (5 x 6) "layout": 5 grobs
  z     cells      name                        grob
1 1 (1-1,3-3)      main         text[GRID.text.179]
2 2 (4-4,1-1)  row_tree polyline[GRID.polyline.180]
3 3 (4-4,3-3)    matrix       gTree[GRID.gTree.182]
4 4 (5-5,3-3) col_names         text[GRID.text.183]
5 5 (4-5,5-5)    legend       gTree[GRID.gTree.186]

When applying my original method ( b <- data.frame(cutree(res$tree_row, k=5))) the output looks like this

> head(b)
                   cutree.res.tree_row..k...5.
ENSMUSG00000000001                           1
ENSMUSG00000000028                           2
ENSMUSG00000000031                           3
ENSMUSG00000000056                           3
ENSMUSG00000000058                           1
ENSMUSG00000000078                           1

With the suggested method (res2<-matrix[c(res$tree_row[["order"]]),]) the output looks like this

> head(res2)
                      FPKM_1b    FPKM_2b   FPKM_3b   FPKM_4b   FPKM_5b   FPKM_6b
ENSMUSG00000043257  0.3025560  0.4250808 0.4468834 0.5655751 0.5719265 0.4285203
ENSMUSG00000020283  0.9988489  1.0104823 1.1025160 1.2239578 1.1445088 1.0986714
ENSMUSG00000028469 -0.1916256 -0.1084386 0.1324678 0.3324304 0.2095445 0.0737440
ENSMUSG00000021939  1.5766039  1.5866526 1.6339802 1.6939370 1.6958720 1.6544634
ENSMUSG00000068040  1.2922228  1.2956221 1.3249447 1.3735446 1.3566950 1.3483341
ENSMUSG00000062611  2.6689941  2.7180206 2.6920966 2.7256585 2.7129930 2.7073197

What do I need to do to get the output in the way of my original method? I only need the gene ID and in which cluster it is located.

ADD REPLY • link 8.4 years ago by thomasterTW ▴ 20