I'm new to R and am making a heatmap for some RNA sequencing data using p.heatmap. My input data is the Log2CPM of genes across 5 samples (samples in columns, genes in rows). I want to understand whether I should scale my data or not, using the scale() function. And secondly, if I should set scale="row" in the p.heatmap function or not. Here is my code:
heatmap_trial_2 <- read.csv("Final genes_log2CPM.csv")
heatmap_trial_2 <- data.frame(heatmap_trial_2[,-1], row.names=heatmap_trial_2[,1])
sc_1 <-t(scale(t(heatmap_trial_2), center = TRUE, scale = TRUE))
pheatmap(sc_1, kmeans_k = NA, breaks = NA, scale = "none", cluster_rows = FALSE,
cluster_cols = FALSE,
show_rownames = TRUE, show_colnames = TRUE,
colorRampPalette(brewer.pal(9,"BuPu"))(100))
I noticed that if I set scale = "row"
in the p.heatmap code, then the heatmap looks exactly the same regardless of whether i set scale = TRUE
or scale = FALSE
using the scale function.
But if I set scale = TRUE
using the scale function and then set the scale = "none"
in the p.heatmap code (which is the code given above), then the plot is different.
I am struggling to determine which of these is the correct way to do it for my data. At what step should I perform the "scaling"? Any help would be highly appreciated, thanks!
In pheatmap, scale has only 3 input paratmeters. Copy/pasted from manual:
If you set scale = T, it's always row wise scaling.