Question

How to get the order of clustered genes of heatmap.2 to a .csv file?

1

Entering edit mode

5.2 years ago

WUSCHEL ▴ 810

I have a data frame of omics data. Gene ids in rows (931), and samples in columns (15).

> dim(my_data) # (rows columns)
[1] 931  16

I created heatmap using library(gplots)

cn=colnames(gdf1)[c(13:15,1:12)]

col <- colorRampPalette(c("red","yellow","darkgreen"))(30)

heatmap.2(as.matrix(gdf1[,cn]), 
          dendrogram = "row", 
          Colv = FALSE, 
          Rowv = TRUE,
          scale = "none", 
          col = col,
          key = TRUE, 
          density.info = "none", 
          key.title = NA, 
          key.xlab = "Abundance",
          trace = "none",
          margins = c(7, 15))

However, in the heatmap, I see only few genes. Since it has 900 ish genes. How can I export what are the clustered genes in each cluster in the same order as in the heatmap?

Also, how can I reduce the size of colorkey?

Thank you.

RNA-Seq R • 16k views

ADD COMMENT • link updated 5.2 years ago by Kevin Blighe 88k • written 5.2 years ago by WUSCHEL ▴ 810

score 6 · Accepted Answer · 2019-09-12

6

Entering edit mode

5.2 years ago

Kevin Blighe 88k

Edit 13th September, 2019: To additionally see how to extract clusters of genes from the heatmap dendrogram, zoom down to this later comment: C: How to get the order of clustered genes of a heat map to a .csv file?

---------------

Hello,

First, create random data

mat <- matrix(rexp(200, rate=.1), ncol=20)
rownames(mat) <- paste0('gene',1:nrow(mat))
colnames(mat) <- paste0('sample',1:ncol(mat))
mat[1:5,1:5]
         sample1   sample2   sample3   sample4     sample5
gene1  0.6247039  3.020142  8.303563  6.482744  0.59547154
gene2  2.6650871  3.375123  5.778222 19.410709  0.07966728
gene3  4.6343755  5.491166  8.716883  9.490372 29.03157875
gene4 13.6086878  3.632815 10.688699  1.263853  2.54216953
gene5  2.4060078 14.283380  8.592085  3.998141  0.25853135

Generate a heatmap and save it to `out`

out <- heatmap.2(mat)

Obtain list of genes, ordered as per heatmap (from bottom, up):

rownames(mat)[out$rowInd]
 [1] "gene2"  "gene9"  "gene7"  "gene8"  "gene4"  "gene5"  "gene3"  "gene6" 
 [9] "gene1"  "gene10"

Plot the row dendrogram on its own:

plot(out$rowDendrogram)

Change colour key size

Use keysize parameter

----------------------------------

See also here for pheatmap: A: extract dendrogram cluster from pheatmap

Kevin

ADD COMMENT • link 5.2 years ago by Kevin Blighe 88k

1

Entering edit mode

Thanks a heap, Kevin :) This is helpful. BTW, I did try this now. Since I have ~900 genes. it's too much for plotting in a diagram. Gow can I export these clustered genes to a csv file, in the same order as showing in the dendrogram?.

ADD REPLY • link 5.2 years ago by WUSCHEL ▴ 810

2

Entering edit mode

Hey, you just mean like this ? -

write.table(
  data.frame(gene = rownames(mat)[out$rowInd]),
  'out.csv',
  row.names = FALSE,
  quote = FALSE,
  sep = ',')

Remember that the order is bottom-to-top

ADD REPLY • link 5.2 years ago by Kevin Blighe 88k

1

Entering edit mode

Thanks Kevin, This also useful. I am looking into much more detailed out put. Not sure if this is possible in heatmap2.

For selected cutoff (e.g. distance = 40), how can we separate the list of genes n that cluster?

something like this, How to see the grouping of genes for each cluster in the o/p?

ADD REPLY • link 5.2 years ago by WUSCHEL ▴ 810

2

Entering edit mode

Ah - I see. In that case, it is easier to create your own dendrogram outside heatmap.2(), and then use cutree() on that:

Create random data

mat <- matrix(rexp(200, rate=.1), ncol=20)
rownames(mat) <- paste0('gene',1:nrow(mat))
colnames(mat) <- paste0('sample',1:ncol(mat))

Cluster the genes (rows) manually

row_clust <- hclust(dist(mat, method = 'euclidean'), method = 'ward.D2')

Plot the heatmap

require(gplots)
out <- heatmap.2(
  mat,
  Rowv = as.dendrogram(row_clust))

plot(row_clust)

They are the same.

Cut the dendrogram into groups or specify a height for the cut-off:

#2 groups
sort(cutree(row_clust, k=2))
 gene1  gene2  gene3  gene4  gene6  gene7  gene8  gene9 gene10  gene5 
     1      1      1      1      1      1      1      1      1      2 

#5 groups
sort(cutree(row_clust, k=5))
 gene1  gene9  gene2  gene3  gene4  gene8 gene10  gene5  gene6  gene7 
     1      1      2      3      3      3      3      4      5      5 


# specify a height of 70
sort(cutree(row_clust, h = 70))
 gene1  gene2  gene3  gene4  gene5  gene6  gene7  gene8  gene9 gene10 
     1      1      2      2      3      4      4      2      1      2 

plot(row_clust)
abline(h = 70, col = "red2", lty = 2, lwd = 2)

----------------------

You should be able to output these lists as per the order in the heatmap / dendrogram to (the indices are stored in out$rowInd)

ADD REPLY • link 5.2 years ago by Kevin Blighe 88k

1

Entering edit mode

Thanks a heap, Kevin :) Appreciate.

ADD REPLY • link 5.2 years ago by WUSCHEL ▴ 810

0

Entering edit mode

Hi Kevin:

Is it possible to do the same in ComplexHeat map. I put the above command in Heatmap it throws out a message:"

Error in Heatmap(mydat_2, name = "mat", Rowv = as.dendrogram(row_clust), : unused argument (Rowv = as.dendrogram(row_clust))

I read the manual of complexheatmap(split by dendrogram part), but does not understand fully how it can be done in practice.

ADD REPLY • link 4.5 years ago by Kai_Qi ▴ 130

1

Entering edit mode

For ComplexHeatmap, it's a bit different. I think that you want to do this:

https://github.com/jokergoo/ComplexHeatmap/issues/136#issuecomment-349603514

?

Or is this what you want to do:

A: Complex Heatmap: Changing order of clusters

?

ADD REPLY • link 4.5 years ago by Kevin Blighe 88k

0

Entering edit mode

Hi Kevlin:

Thank you for the link, when I tried to re-do what the code instructed, but I got an error I can't understand. My matrix contains 5 columns, column 1 to column4 are the counts at different stages, while column5 are the differences between colum3 and column1 (which is diff=column3-colum1). Rowname of the matrix is the gene.

I use row_clust <- hclust(dist(mydat_2, method = 'euclidean'), method = 'ward.D2') to do the clustering manually. And I use

HM <- Heatmap(mydat_2, name = "mat", cluster_rows = row_clust, col = col_fun, column_order = colnames(mydat_2), column_title = "Developmental Stages",
               column_title_side = "bottom", row_title="Retained_introns", show_row_names = FALSE)
HM <- draw(HM)

To draw the Heatmap. The Heatmap looks beautiful. I have 2 question on how to make it better: 1. the number of column5 is a little bit different from column1-4 because it is (column3-column1). It contains negative value. So the color can not exactly match what I set for column 1-4. So, my question is in ComplexHeatmap, is it possible to set the column5 side by side(same gene cluster) to column 1-4 with a different color pannel?

When I run the code from the github, I got an error, which I can not understand why:

for (i in 1:length(row_order(HM))){
- if (i == 1) {
- clu <- t(t(row.names(mydat_2[row_order(HM)[[i]],])))
- out1 <- cbind(clu, paste("cluster", i, sep=""))
- colnames(out1) <- c("coordinates", "Cluster")
- } else {
- clu <- t(t(row.names(mydat_2[row_order(HM)[[i]],])))
- clu <- cbind(clu, paste("cluster", i, sep=""))
- out1 <- rbind(out1, clu)
- }
- } Error in t.default(row.names(mydat_2[row_order(HM)[[i]], ])) : argument is not a matrix

since mydat_2 is a matrix, why extract the rows is not a matrix?

I hope I expressed myself clear and thanks for any kind of advice,

ADD REPLY • link 4.5 years ago by Kai_Qi ▴ 130

---------------

First, create random data

Generate a heatmap and save it to out

Obtain list of genes, ordered as per heatmap (from bottom, up):

Plot the row dendrogram on its own:

Change colour key size

----------------------------------

Create random data

Cluster the genes (rows) manually

Plot the heatmap

Cut the dendrogram into groups or specify a height for the cut-off:

----------------------

Generate a heatmap and save it to `out`