Question

Scale Data Before Drawing Heatmap Or Using Heatmap(..., Scale="Columan") In R?

3

Entering edit mode

13.6 years ago

C Shao ▴ 140

Hi everyone,

The "scale" in heatmap confuses me. Scale data before drawing heatmap and use heatmap(...,scale="XXX",...) get different results.

For example:

mtscaled <- as.matrix(scale(mtcars))

heatmap(mtscaled, scale='none') 

xx.1 <- as.matrix(mtcars)

heatmap(xx.1, scale='column')

produce different clustering results.

Does anyone have an idea about this? If both ways are reasonable, which one should I choose?

Thanks a lot!

heatmap r • 48k views

ADD COMMENT • link updated 13.6 years ago by Michael 56k • written 13.6 years ago by C Shao ▴ 140

0

Entering edit mode

the one that better show you the data :)

ADD REPLY • link 13.6 years ago by Vladimir Chupakhin ▴ 520

score 14 · Answer 1 · 2011-12-11

14

Entering edit mode

13.6 years ago

Michael 56k

The difference is that in heatmap, the scaling is done after the dendrogram is computed, the code in heatmap doesn't use scale but the numeric results are the same. For data on very different scales(e.g.horse power, number of cylinders) it might be better to scale before clustering as you did. When you look at the column dendrogram of using scale before heatmap, it looks more sensible to me.

For your interest, this is the code from heatmap, that does the scaling:

if (scale == "row") {
    x <- sweep(x, 1L, rowMeans(x, na.rm = na.rm), check.margin = FALSE)
    sx <- apply(x, 1L, sd, na.rm = na.rm)
    x <- sweep(x, 1L, sx, "/", check.margin = FALSE)
}
else if (scale == "column") {
    x <- sweep(x, 2L, colMeans(x, na.rm = na.rm), check.margin = FALSE)
    sx <- apply(x, 2L, sd, na.rm = na.rm)
    x <- sweep(x, 2L, sx, "/", check.margin = FALSE)
}

You will find that it comes after the code that does the clustering.

Edit: One more idea I had: it might be a good idea to scale both, rows and columns before the analysis, which is not possible using heatmap.

While scale centers and scales columns, it can be used easily to scale both by using:

x <- scale(x) # scale and center columns
x <- t(scale(t(x))) # scale and center rows

ADD COMMENT • link 13.6 years ago by Michael 56k

0

Entering edit mode

Thanks for the answer, but I am still confusing. In the beginning of heatmap code, there is "scale <- if (symm && missing(scale)) "none" else match.arg(scale)". Does this command scale the data? And if the clustering do not use scaled data, what is the meaning of scale option in heatmap?

ADD REPLY • link 13.6 years ago by C Shao ▴ 140

1

Entering edit mode

That code only checks if the the parameter is set correctly. The scaling still has effect on the graphic output by scaling such that the color choice is improved, otherwise the color choice would be dominated by the extreme values (try plotting without scaling, everything is red). I agree, it is not easy to what else the benefit of scaling after clustering would be. I think it's often better to scale before clustering.