Hi everybody, im trying to replicate this image from a paper (of which I have replicated all analysis except for this heatmap) :
Caption from the paper : Heatmap of unsupervised clustering of platelet mRNA profiles of healthy donors (red) and patients with cancer (gray). Paper link : https://www.cell.com/cancer-cell/fulltext/S1535-6108(15)00349-9
What I did is the following :
data <- cpm(counts,prior.count = 1,log=TRUE)
data.scaled <- scale(data,center = TRUE, scale = TRUE) #I tried all different scale types
data.scaled <- data.scaled[rownames(TopGenes),] #Choose just some genes so they are not all as the paper did
h <- pheatmap(normList.scaled[0:300,], cluster_rows=FALSE,name ="Z-score",
cluster_cols=FALSE,annotation_col=info)#,annotation_col=info) #0:300 because all of them make my computer slow
And this is what comes out :
Now the order of the samples is not important ( I did another order type ) , but i don't understand how they computed the z-scores. They didnt specify the steps so i guessed they first did the cpm and then scaled? But it doesn't look right.
That looks like a ComplexHeatmap with a topAnnotation. CH clusters using
hclust(dist(mat))
by default. You may want to try using CH instead of pheatmap.I'd recommend emailing the authors and requesting the exact code used to generate the heatmap.
One of their supplemental files show row scaling (https://www.cell.com/cms/10.1016/j.ccell.2015.09.018/attachment/b9cc502c-88f0-4326-9322-67e22687c5e9/mmc1). Probably you can try that. Currently you are doing column scaling, Try row scaling.
Thank you for the reply, I tried row scaling too but still is completely different from their work. I have seen the supplemental files but is not said how to do it still. They just show another Heatmap saying that refers to the count per millions but I did it too (cpm and then scaling) but my values are way higher than 1.5 (like 10)