Question

To get genes for each clusters from heatmap

0

Entering edit mode

10.1 years ago

ChIP ▴ 600

Hi,

I have got a heatmap of differentially expressed genes between two groups, like this:

d<-read.table("Appended.Ratio.VraakData.txt", header=T)
row.names(d)<-d$gene
d<-d[,2:8]
dm<-data.matrix(d)
pvals <- apply(dm, 1, function(x) wilcox.test(x~c(rep(0,3), rep(1,4)))$p.value)
heatmap(dm[pvals<0.06,])

and the data looks like this:

            OFP1     OFP2     OFP3  Normal1  Normal2  Normal3  Normal4
A1BG    5.391881 5.464583 5.593600 5.931914 6.986908 6.408578 6.339434
A1CF    6.457622 6.494549 6.346922 6.352600 6.345581 6.419962 6.822000
A2BP1   6.825420 6.801401 6.828966 7.000525 6.485443 6.764178 6.561524
A2BP1_2 6.401335 6.403443 6.292540 6.113847 6.108393 6.253338 6.133938
A2BP1_3 8.550718 8.599124 8.689958 8.194579 7.919759 8.054256 7.905359
A2LD1   6.837739 6.922236 6.691391 7.260664 7.435045 7.413092 8.188466

How can I extract the row names for the heatmap in the order that they have been plotted? Also, row.names per cluster (as the rows also have been clustered)?

Thank you

heatmap R • 8.5k views

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 10.1 years ago by ChIP ▴ 600

Ram · Answer 1 · 2015-07-13

You should type:

heatmap

copy the function to a new script "heatmap2" like so:

heatmap2 <- function(...... ## from heatmap

add this line right before the final brace:

return(rev(labRow))

then call your same heatmap function but assign it to a variable:

t <- heatmap2(dm[pvals<0.06,])

This will output the row names used for the heatmap into variable t in your local environment. You can repeat for column names if you'd like, or add the variable assignment explicitly to the function.

I'm not sure what you mean about row names by cluster.

Here is the full code for the heatmap2 function below. I'm not sure if the copy/paste will work from Biostar's site. It will be super messy in any case, so I recommend doing it yourself as detailed above:

heatmap2 <- function (x, Rowv = NULL, Colv = if (symm) "Rowv" else NULL,
                      distfun = dist, hclustfun = hclust, reorderfun = function(d,
                                                                                w) reorder(d, w), add.expr, symm = FALSE, revC = identical(Colv, 
                                                                                                                                           "Rowv"), scale = c("row", "column", "none"), na.rm = TRUE, 
                      margins = c(5, 5), ColSideColors, RowSideColors, cexRow = 0.2 + 
                              1/log10(nr), cexCol = 0.2 + 1/log10(nc), labRow = NULL, 
                      labCol = NULL, main = NULL, xlab = NULL, ylab = NULL, keep.dendro = FALSE, 
                      verbose = getOption("verbose"), ...) 
{
        scale <- if (symm && missing(scale)) 
                "none"
        else match.arg(scale)
        if (length(di <- dim(x)) != 2 || !is.numeric(x)) 
                stop("'x' must be a numeric matrix")
        nr <- di[1L]
        nc <- di[2L]
        if (nr <= 1 || nc <= 1) 
                stop("'x' must have at least 2 rows and 2 columns")
        if (!is.numeric(margins) || length(margins) != 2L) 
                stop("'margins' must be a numeric vector of length 2")
        doRdend <- !identical(Rowv, NA)
        doCdend <- !identical(Colv, NA)
        if (!doRdend && identical(Colv, "Rowv")) 
                doCdend <- FALSE
        if (is.null(Rowv)) 
                Rowv <- rowMeans(x, na.rm = na.rm)
        if (is.null(Colv)) 
                Colv <- colMeans(x, na.rm = na.rm)
        if (doRdend) {
                if (inherits(Rowv, "dendrogram")) 
                        ddr <- Rowv
                else {
                        hcr <- hclustfun(distfun(x))
                        ddr <- as.dendrogram(hcr)
                        if (!is.logical(Rowv) || Rowv) 
                                ddr <- reorderfun(ddr, Rowv)
                }
                if (nr != length(rowInd <- order.dendrogram(ddr))) 
                        stop("row dendrogram ordering gave index of wrong length")
        }
        else rowInd <- 1L:nr
        if (doCdend) {
                if (inherits(Colv, "dendrogram")) 
                        ddc <- Colv
                else if (identical(Colv, "Rowv")) {
                        if (nr != nc) 
                                stop("Colv = \"Rowv\" but nrow(x) != ncol(x)")
                        ddc <- ddr
                }
                else {
                        hcc <- hclustfun(distfun(if (symm) 
                                x
                                else t(x)))
                        ddc <- as.dendrogram(hcc)
                        if (!is.logical(Colv) || Colv) 
                                ddc <- reorderfun(ddc, Colv)
                }
                if (nc != length(colInd <- order.dendrogram(ddc))) 
                        stop("column dendrogram ordering gave index of wrong length")
        }
        else colInd <- 1L:nc
        x <- x[rowInd, colInd]
        labRow <- if (is.null(labRow)) 
                if (is.null(rownames(x))) 
                        (1L:nr)[rowInd]
        else rownames(x)
        else labRow[rowInd]
        labCol <- if (is.null(labCol)) 
                if (is.null(colnames(x))) 
                        (1L:nc)[colInd]
        else colnames(x)
        else labCol[colInd]
        if (scale == "row") {
                x <- sweep(x, 1L, rowMeans(x, na.rm = na.rm), check.margin = FALSE)
                sx <- apply(x, 1L, sd, na.rm = na.rm)
                x <- sweep(x, 1L, sx, "/", check.margin = FALSE)
        }
        else if (scale == "column") {
                x <- sweep(x, 2L, colMeans(x, na.rm = na.rm), check.margin = FALSE)
                sx <- apply(x, 2L, sd, na.rm = na.rm)
                x <- sweep(x, 2L, sx, "/", check.margin = FALSE)
        }
        lmat <- rbind(c(NA, 3), 2:1)
        lwid <- c(if (doRdend) 1 else 0.05, 4)
        lhei <- c((if (doCdend) 1 else 0.05) + if (!is.null(main)) 0.2 else 0, 
                  4)
        if (!missing(ColSideColors)) {
                if (!is.character(ColSideColors) || length(ColSideColors) != 
                            nc) 
                        stop("'ColSideColors' must be a character vector of length ncol(x)")
                lmat <- rbind(lmat[1, ] + 1, c(NA, 1), lmat[2, ] + 1)
                lhei <- c(lhei[1L], 0.2, lhei[2L])
        }
        if (!missing(RowSideColors)) {
                if (!is.character(RowSideColors) || length(RowSideColors) != 
                            nr) 
                        stop("'RowSideColors' must be a character vector of length nrow(x)")
                lmat <- cbind(lmat[, 1] + 1, c(rep(NA, nrow(lmat) - 1), 
                                               1), lmat[, 2] + 1)
                lwid <- c(lwid[1L], 0.2, lwid[2L])
        }
        lmat[is.na(lmat)] <- 0
        if (verbose) {
                cat("layout: widths = ", lwid, ", heights = ", lhei, 
                    "; lmat=\n")
                print(lmat)
        }
        dev.hold()
        on.exit(dev.flush())
        op <- par(no.readonly = TRUE)
        on.exit(par(op), add = TRUE)
        layout(lmat, widths = lwid, heights = lhei, respect = TRUE)
        if (!missing(RowSideColors)) {
                par(mar = c(margins[1L], 0, 0, 0.5))
                image(rbind(if (revC) 
                        nr:1L
                        else 1L:nr), col = RowSideColors[rowInd], axes = FALSE)
        }
        if (!missing(ColSideColors)) {
                par(mar = c(0.5, 0, 0, margins[2L]))
                image(cbind(1L:nc), col = ColSideColors[colInd], axes = FALSE)
        }
        par(mar = c(margins[1L], 0, 0, margins[2L]))
        if (!symm || scale != "none") 
                x <- t(x)
        if (revC) {
                iy <- nr:1
                if (doRdend) 
                        ddr <- rev(ddr)
                x <- x[, iy]
        }
        else iy <- 1L:nr
        image(1L:nc, 1L:nr, x, xlim = 0.5 + c(0, nc), ylim = 0.5 + 
                      c(0, nr), axes = FALSE, xlab = "", ylab = "", ...)
        axis(1, 1L:nc, labels = labCol, las = 2, line = -0.5, tick = 0, 
             cex.axis = cexCol)
        if (!is.null(xlab)) 
                mtext(xlab, side = 1, line = margins[1L] - 1.25)
        axis(4, iy, labels = labRow, las = 2, line = -0.5, tick = 0, 
             cex.axis = cexRow)
        if (!is.null(ylab)) 
                mtext(ylab, side = 4, line = margins[2L] - 1.25)
        if (!missing(add.expr)) 
                eval(substitute(add.expr))
        par(mar = c(margins[1L], 0, 0, 0))
        if (doRdend) 
                plot(ddr, horiz = TRUE, axes = FALSE, yaxs = "i", leaflab = "none")
        else frame()
        par(mar = c(0, 0, if (!is.null(main)) 1 else 0, margins[2L]))
        if (doCdend) 
                plot(ddc, axes = FALSE, xaxs = "i", leaflab = "none")
        else if (!is.null(main)) 
                frame()
        if (!is.null(main)) {
                par(xpd = NA)
                title(main, cex.main = 1.5 * op[["cex.main"]])
        }
        invisible(list(rowInd = rowInd, colInd = colInd, Rowv = if (keep.dendro && 
                                                                            doRdend) ddr, Colv = if (keep.dendro && doCdend) ddc))
        return(rev(labRow))
}

Ram · Answer 2 · 2015-07-13

I have run into issues with extracting clusters directly from the heat map function. Thus, I prefer to cluster using hclust(), and then pass in the hclust object as a dendrogram to manually to the plotting function to set the order of the rows/cols. You can then use cutree() on the hclust object to get any number of clusters you want. Just adjust the dist() parameters to calculate dissimilarity differently, and adjust the hclust() parameters to cluster differently.

hc <- hclust(dist(my_matrix, method = "euclidean"), method = "complete")
ct <- cutree(hc, k = num_clusters)
heatmap.2(my_matrix, Colv = as.dendrogram(hc), trace="none")

Ram · Answer 3 · 2015-07-13

2

Entering edit mode

10.1 years ago

Zhilong Jia ★ 2.2k

The cluster core of heatmap is two parameters hclustfun and distfun (see ?heatmap). So just use the same clustering methods and distfun distance independently, you will get the same clustering result and then get the genes in clusters.

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 10.1 years ago by Zhilong Jia ★ 2.2k

0

Entering edit mode

Zhilong Jia can you suggest the way using two parameters hclustfun and distfun for clustering and then to get genes in clusters

Example matrix

                           KO1          KO2          KO3         KO4          WT1         WT2          WT3          WT4
ENSMUSG00000105261  1.97669144 -0.987069146 -1.722779121 -1.65603203  2.224570144  0.17856281  0.375775839 -0.389719944
ENSMUSG00000028772 -0.82866161 -0.896702497 -0.932372291 -0.95340173  0.925307100  0.92632032  0.914009132  0.845501575
ENSMUSG00000001467  0.89745163  0.813375968  0.934779401  0.88236705 -0.867636578 -0.84198944 -0.836182175 -0.982165859
ENSMUSG00000064349 -0.79347352 -0.766544295 -0.681390227 -0.92332469  0.771244099  0.82480434  0.788785224  0.779899074
ENSMUSG00000064369 -0.77075330 -0.896571993 -0.647333799 -0.80654659  0.796070516  0.78240275  0.766087554  0.776644868
ENSMUSG00000055323 -0.76369956 -0.747911581 -0.622770909 -0.77776417  0.812804888  0.68977231  0.757876866  0.651692154
ENSMUSG00000077990 -0.74844981 -0.689830627 -0.616611493 -0.81558639  0.646501644  0.76065698  0.756089610  0.707230091
ENSMUSG00000015750 -0.70077222 -0.759704536 -0.741144048 -0.66762535  0.626911815  0.72953121  0.800841017  0.711962110
ENSMUSG00000097157 -0.63640395 -0.835319068 -0.682088025 -0.67325275  0.911121746  0.69891432  0.559129727  0.657898002

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 8.1 years ago by macmath ▴ 170

2

Entering edit mode

use hclust function with "euclidean" distance by default. you can find the pattern in the screenshot with the code.

require(graphics)
 hc <- hclust(dist(USArrests), "ave")
hc$order
hc$labels
plot(hc)

enter image description here

ADD REPLY • link 8.1 years ago by Zhilong Jia ★ 2.2k