Question

Scaling for p.heatmap

0

Entering edit mode

3.1 years ago

bnayer26 • 0

I'm new to R and am making a heatmap for some RNA sequencing data using p.heatmap. My input data is the Log2CPM of genes across 5 samples (samples in columns, genes in rows). I want to understand whether I should scale my data or not, using the scale() function. And secondly, if I should set scale="row" in the p.heatmap function or not. Here is my code:

heatmap_trial_2 <- read.csv("Final genes_log2CPM.csv")
heatmap_trial_2 <- data.frame(heatmap_trial_2[,-1], row.names=heatmap_trial_2[,1])
sc_1 <-t(scale(t(heatmap_trial_2), center = TRUE, scale = TRUE))
pheatmap(sc_1, kmeans_k = NA, breaks = NA, scale = "none", cluster_rows = FALSE,
         cluster_cols = FALSE,
         show_rownames = TRUE, show_colnames = TRUE,
         colorRampPalette(brewer.pal(9,"BuPu"))(100))

I noticed that if I set scale = "row" in the p.heatmap code, then the heatmap looks exactly the same regardless of whether i set scale = TRUE or scale = FALSE using the scale function. But if I set scale = TRUE using the scale function and then set the scale = "none" in the p.heatmap code (which is the code given above), then the plot is different. I am struggling to determine which of these is the correct way to do it for my data. At what step should I perform the "scaling"? Any help would be highly appreciated, thanks!

variance p.heatmap scaling unit • 4.0k views

ADD COMMENT • link 3.1 years ago by bnayer26 • 0

0

Entering edit mode

In pheatmap, scale has only 3 input paratmeters. Copy/pasted from manual:

scale
character indicating if the values should be centered and scaled in either the row direction or the column direction, or none. Corresponding values are "row", "column" and "none"

If you set scale = T, it's always row wise scaling.

ADD REPLY • link 3.1 years ago by cpad0112 21k

score 0 · Answer 1 · 2022-06-24

0

Entering edit mode

3.1 years ago

ATpoint 88k

Scaling in the heatmap context usually means that one standardizes the expression data (usually the normalized counts on the log scale) to give them a mean of zero and a standard deviation of one. This is what you do in sc_1. This is good because it allows to compare genes with different expression levels. Here are more details: Scaling RNA-Seq data before clustering?

That means if you scale externally then you don't have to scale inside the heatmap function. I am not a pheatmap user but scale ="none" appears reasonable to me.