To get genes for each clusters from heatmap
3
0
Entering edit mode
9.4 years ago
ChIP ▴ 600

Hi,

I have got a heatmap of differentially expressed genes between two groups, like this:

d<-read.table("Appended.Ratio.VraakData.txt", header=T)
row.names(d)<-d$gene
d<-d[,2:8]
dm<-data.matrix(d)
pvals <- apply(dm, 1, function(x) wilcox.test(x~c(rep(0,3), rep(1,4)))$p.value)
heatmap(dm[pvals<0.06,])

and the data looks like this:

            OFP1     OFP2     OFP3  Normal1  Normal2  Normal3  Normal4
A1BG    5.391881 5.464583 5.593600 5.931914 6.986908 6.408578 6.339434
A1CF    6.457622 6.494549 6.346922 6.352600 6.345581 6.419962 6.822000
A2BP1   6.825420 6.801401 6.828966 7.000525 6.485443 6.764178 6.561524
A2BP1_2 6.401335 6.403443 6.292540 6.113847 6.108393 6.253338 6.133938
A2BP1_3 8.550718 8.599124 8.689958 8.194579 7.919759 8.054256 7.905359
A2LD1   6.837739 6.922236 6.691391 7.260664 7.435045 7.413092 8.188466

How can I extract the row names for the heatmap in the order that they have been plotted? Also, row.names per cluster (as the rows also have been clustered)?

Thank you

heatmap R • 8.0k views
ADD COMMENT
3
Entering edit mode
9.4 years ago
Steven Lakin ★ 1.8k

You should type:

heatmap

copy the function to a new script "heatmap2" like so:

heatmap2 <- function(...... ## from heatmap

add this line right before the final brace:

return(rev(labRow))

then call your same heatmap function but assign it to a variable:

t <- heatmap2(dm[pvals<0.06,])

This will output the row names used for the heatmap into variable t in your local environment. You can repeat for column names if you'd like, or add the variable assignment explicitly to the function.

I'm not sure what you mean about row names by cluster.

Here is the full code for the heatmap2 function below. I'm not sure if the copy/paste will work from Biostar's site. It will be super messy in any case, so I recommend doing it yourself as detailed above:

heatmap2 <- function (x, Rowv = NULL, Colv = if (symm) "Rowv" else NULL,
                      distfun = dist, hclustfun = hclust, reorderfun = function(d,
                                                                                w) reorder(d, w), add.expr, symm = FALSE, revC = identical(Colv, 
                                                                                                                                           "Rowv"), scale = c("row", "column", "none"), na.rm = TRUE, 
                      margins = c(5, 5), ColSideColors, RowSideColors, cexRow = 0.2 + 
                              1/log10(nr), cexCol = 0.2 + 1/log10(nc), labRow = NULL, 
                      labCol = NULL, main = NULL, xlab = NULL, ylab = NULL, keep.dendro = FALSE, 
                      verbose = getOption("verbose"), ...) 
{
        scale <- if (symm && missing(scale)) 
                "none"
        else match.arg(scale)
        if (length(di <- dim(x)) != 2 || !is.numeric(x)) 
                stop("'x' must be a numeric matrix")
        nr <- di[1L]
        nc <- di[2L]
        if (nr <= 1 || nc <= 1) 
                stop("'x' must have at least 2 rows and 2 columns")
        if (!is.numeric(margins) || length(margins) != 2L) 
                stop("'margins' must be a numeric vector of length 2")
        doRdend <- !identical(Rowv, NA)
        doCdend <- !identical(Colv, NA)
        if (!doRdend && identical(Colv, "Rowv")) 
                doCdend <- FALSE
        if (is.null(Rowv)) 
                Rowv <- rowMeans(x, na.rm = na.rm)
        if (is.null(Colv)) 
                Colv <- colMeans(x, na.rm = na.rm)
        if (doRdend) {
                if (inherits(Rowv, "dendrogram")) 
                        ddr <- Rowv
                else {
                        hcr <- hclustfun(distfun(x))
                        ddr <- as.dendrogram(hcr)
                        if (!is.logical(Rowv) || Rowv) 
                                ddr <- reorderfun(ddr, Rowv)
                }
                if (nr != length(rowInd <- order.dendrogram(ddr))) 
                        stop("row dendrogram ordering gave index of wrong length")
        }
        else rowInd <- 1L:nr
        if (doCdend) {
                if (inherits(Colv, "dendrogram")) 
                        ddc <- Colv
                else if (identical(Colv, "Rowv")) {
                        if (nr != nc) 
                                stop("Colv = \"Rowv\" but nrow(x) != ncol(x)")
                        ddc <- ddr
                }
                else {
                        hcc <- hclustfun(distfun(if (symm) 
                                x
                                else t(x)))
                        ddc <- as.dendrogram(hcc)
                        if (!is.logical(Colv) || Colv) 
                                ddc <- reorderfun(ddc, Colv)
                }
                if (nc != length(colInd <- order.dendrogram(ddc))) 
                        stop("column dendrogram ordering gave index of wrong length")
        }
        else colInd <- 1L:nc
        x <- x[rowInd, colInd]
        labRow <- if (is.null(labRow)) 
                if (is.null(rownames(x))) 
                        (1L:nr)[rowInd]
        else rownames(x)
        else labRow[rowInd]
        labCol <- if (is.null(labCol)) 
                if (is.null(colnames(x))) 
                        (1L:nc)[colInd]
        else colnames(x)
        else labCol[colInd]
        if (scale == "row") {
                x <- sweep(x, 1L, rowMeans(x, na.rm = na.rm), check.margin = FALSE)
                sx <- apply(x, 1L, sd, na.rm = na.rm)
                x <- sweep(x, 1L, sx, "/", check.margin = FALSE)
        }
        else if (scale == "column") {
                x <- sweep(x, 2L, colMeans(x, na.rm = na.rm), check.margin = FALSE)
                sx <- apply(x, 2L, sd, na.rm = na.rm)
                x <- sweep(x, 2L, sx, "/", check.margin = FALSE)
        }
        lmat <- rbind(c(NA, 3), 2:1)
        lwid <- c(if (doRdend) 1 else 0.05, 4)
        lhei <- c((if (doCdend) 1 else 0.05) + if (!is.null(main)) 0.2 else 0, 
                  4)
        if (!missing(ColSideColors)) {
                if (!is.character(ColSideColors) || length(ColSideColors) != 
                            nc) 
                        stop("'ColSideColors' must be a character vector of length ncol(x)")
                lmat <- rbind(lmat[1, ] + 1, c(NA, 1), lmat[2, ] + 1)
                lhei <- c(lhei[1L], 0.2, lhei[2L])
        }
        if (!missing(RowSideColors)) {
                if (!is.character(RowSideColors) || length(RowSideColors) != 
                            nr) 
                        stop("'RowSideColors' must be a character vector of length nrow(x)")
                lmat <- cbind(lmat[, 1] + 1, c(rep(NA, nrow(lmat) - 1), 
                                               1), lmat[, 2] + 1)
                lwid <- c(lwid[1L], 0.2, lwid[2L])
        }
        lmat[is.na(lmat)] <- 0
        if (verbose) {
                cat("layout: widths = ", lwid, ", heights = ", lhei, 
                    "; lmat=\n")
                print(lmat)
        }
        dev.hold()
        on.exit(dev.flush())
        op <- par(no.readonly = TRUE)
        on.exit(par(op), add = TRUE)
        layout(lmat, widths = lwid, heights = lhei, respect = TRUE)
        if (!missing(RowSideColors)) {
                par(mar = c(margins[1L], 0, 0, 0.5))
                image(rbind(if (revC) 
                        nr:1L
                        else 1L:nr), col = RowSideColors[rowInd], axes = FALSE)
        }
        if (!missing(ColSideColors)) {
                par(mar = c(0.5, 0, 0, margins[2L]))
                image(cbind(1L:nc), col = ColSideColors[colInd], axes = FALSE)
        }
        par(mar = c(margins[1L], 0, 0, margins[2L]))
        if (!symm || scale != "none") 
                x <- t(x)
        if (revC) {
                iy <- nr:1
                if (doRdend) 
                        ddr <- rev(ddr)
                x <- x[, iy]
        }
        else iy <- 1L:nr
        image(1L:nc, 1L:nr, x, xlim = 0.5 + c(0, nc), ylim = 0.5 + 
                      c(0, nr), axes = FALSE, xlab = "", ylab = "", ...)
        axis(1, 1L:nc, labels = labCol, las = 2, line = -0.5, tick = 0, 
             cex.axis = cexCol)
        if (!is.null(xlab)) 
                mtext(xlab, side = 1, line = margins[1L] - 1.25)
        axis(4, iy, labels = labRow, las = 2, line = -0.5, tick = 0, 
             cex.axis = cexRow)
        if (!is.null(ylab)) 
                mtext(ylab, side = 4, line = margins[2L] - 1.25)
        if (!missing(add.expr)) 
                eval(substitute(add.expr))
        par(mar = c(margins[1L], 0, 0, 0))
        if (doRdend) 
                plot(ddr, horiz = TRUE, axes = FALSE, yaxs = "i", leaflab = "none")
        else frame()
        par(mar = c(0, 0, if (!is.null(main)) 1 else 0, margins[2L]))
        if (doCdend) 
                plot(ddc, axes = FALSE, xaxs = "i", leaflab = "none")
        else if (!is.null(main)) 
                frame()
        if (!is.null(main)) {
                par(xpd = NA)
                title(main, cex.main = 1.5 * op[["cex.main"]])
        }
        invisible(list(rowInd = rowInd, colInd = colInd, Rowv = if (keep.dendro && 
                                                                            doRdend) ddr, Colv = if (keep.dendro && doCdend) ddc))
        return(rev(labRow))
}
ADD COMMENT
0
Entering edit mode

Hi,

Thank you for your answer.

How can I get the gene names cluster wiseatleast based upon the broad clusters. for instance say 4 clusters.

thank you.

ADD REPLY
0
Entering edit mode

hello,

suppose that i have already downloaded GSE63706 and normalized that and i have a normalized text file now. and i have also a list of probsets (a text file of my interest probsets) in this array...i want to have a heat map showing the expression pattern of my interest probsets in this array, for example in this array i have 4 varieties and different tissues (rind and flesh) and phases (0,10,20,30,40 and 50 days after harvesting). heat maps showing the expression pattern of my probsets in varieties, tissues and phases i mean

please

ADD REPLY
3
Entering edit mode
9.4 years ago
alolex ▴ 960

I have run into issues with extracting clusters directly from the heat map function. Thus, I prefer to cluster using hclust(), and then pass in the hclust object as a dendrogram to manually to the plotting function to set the order of the rows/cols. You can then use cutree() on the hclust object to get any number of clusters you want. Just adjust the dist() parameters to calculate dissimilarity differently, and adjust the hclust() parameters to cluster differently.

hc <- hclust(dist(my_matrix, method = "euclidean"), method = "complete")
ct <- cutree(hc, k = num_clusters)
heatmap.2(my_matrix, Colv = as.dendrogram(hc), trace="none")
ADD COMMENT
2
Entering edit mode
9.4 years ago
Zhilong Jia ★ 2.2k

The cluster core of heatmap is two parameters hclustfun and distfun (see ?heatmap). So just use the same clustering methods and distfun distance independently, you will get the same clustering result and then get the genes in clusters.

ADD COMMENT
0
Entering edit mode

Zhilong Jia can you suggest the way using two parameters hclustfun and distfun for clustering and then to get genes in clusters

Example matrix

                           KO1          KO2          KO3         KO4          WT1         WT2          WT3          WT4
ENSMUSG00000105261  1.97669144 -0.987069146 -1.722779121 -1.65603203  2.224570144  0.17856281  0.375775839 -0.389719944
ENSMUSG00000028772 -0.82866161 -0.896702497 -0.932372291 -0.95340173  0.925307100  0.92632032  0.914009132  0.845501575
ENSMUSG00000001467  0.89745163  0.813375968  0.934779401  0.88236705 -0.867636578 -0.84198944 -0.836182175 -0.982165859
ENSMUSG00000064349 -0.79347352 -0.766544295 -0.681390227 -0.92332469  0.771244099  0.82480434  0.788785224  0.779899074
ENSMUSG00000064369 -0.77075330 -0.896571993 -0.647333799 -0.80654659  0.796070516  0.78240275  0.766087554  0.776644868
ENSMUSG00000055323 -0.76369956 -0.747911581 -0.622770909 -0.77776417  0.812804888  0.68977231  0.757876866  0.651692154
ENSMUSG00000077990 -0.74844981 -0.689830627 -0.616611493 -0.81558639  0.646501644  0.76065698  0.756089610  0.707230091
ENSMUSG00000015750 -0.70077222 -0.759704536 -0.741144048 -0.66762535  0.626911815  0.72953121  0.800841017  0.711962110
ENSMUSG00000097157 -0.63640395 -0.835319068 -0.682088025 -0.67325275  0.911121746  0.69891432  0.559129727  0.657898002
ADD REPLY
2
Entering edit mode

use hclust function with "euclidean" distance by default. you can find the pattern in the screenshot with the code.

require(graphics)
 hc <- hclust(dist(USArrests), "ave")
hc$order
hc$labels
plot(hc)

enter image description here enter image description here

ADD REPLY

Login before adding your answer.

Traffic: 2343 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6