Question

How to scale raw Counts to obtain z-score Heatmap in R?

0

Entering edit mode

3.1 years ago

ali ▴ 20

Hi everybody, im trying to replicate this image from a paper (of which I have replicated all analysis except for this heatmap) :

enter image description here

Caption from the paper : Heatmap of unsupervised clustering of platelet mRNA profiles of healthy donors (red) and patients with cancer (gray). Paper link : https://www.cell.com/cancer-cell/fulltext/S1535-6108(15)00349-9

What I did is the following :

data <- cpm(counts,prior.count = 1,log=TRUE)
data.scaled <- scale(data,center = TRUE, scale = TRUE) #I tried all different scale types
data.scaled <- data.scaled[rownames(TopGenes),] #Choose just some genes so they are not all as the paper did
h <- pheatmap(normList.scaled[0:300,], cluster_rows=FALSE,name ="Z-score",
                cluster_cols=FALSE,annotation_col=info)#,annotation_col=info) #0:300 because all of them make my computer slow

And this is what comes enter image description here out :

Now the order of the samples is not important ( I did another order type ) , but i don't understand how they computed the z-scores. They didnt specify the steps so i guessed they first did the cpm and then scaled? But it doesn't look right.

GEO R • 4.3k views

ADD COMMENT • link 3.1 years ago by ali ▴ 20

0

Entering edit mode

That looks like a ComplexHeatmap with a topAnnotation. CH clusters using hclust(dist(mat)) by default. You may want to try using CH instead of pheatmap.

ADD REPLY • link 3.1 years ago by Ram 45k

0

Entering edit mode

I'd recommend emailing the authors and requesting the exact code used to generate the heatmap.

ADD REPLY • link 3.1 years ago by Ram 45k

0

Entering edit mode

One of their supplemental files show row scaling (https://www.cell.com/cms/10.1016/j.ccell.2015.09.018/attachment/b9cc502c-88f0-4326-9322-67e22687c5e9/mmc1). Probably you can try that. Currently you are doing column scaling, Try row scaling.

ADD REPLY • link 3.1 years ago by cpad0112 21k

0

Entering edit mode

Thank you for the reply, I tried row scaling too but still is completely different from their work. I have seen the supplemental files but is not said how to do it still. They just show another Heatmap saying that refers to the count per millions but I did it too (cpm and then scaling) but my values are way higher than 1.5 (like 10)

ADD REPLY • link 3.1 years ago by ali ▴ 20

score 1 · Answer 1 · 2022-05-03

1

Entering edit mode

3.1 years ago

ATpoint 88k

See my answer here on how-to and why one scales data for heatmaps Scaling RNA-Seq data before clustering?

ADD COMMENT • link 3.1 years ago by ATpoint 88k

0

Entering edit mode

Hey thanks for this link, it was really useful. The preprocessing step is the same I did but with filtering FDR < 0.001 because thats how they did it and logCPM > 3. I have retrived something like this :

enter image description here

It looks really similar now (just inverted , maybe the gene order is exactly my opposite idk). Could you explain to me why I do have 4 and -4 and they have 1.5 and -1.5? I read what you wrote on the other post and I got what z-score is but , if the data I'm using is the same , how is possible that the z-scores differ. Thats the code I use now (still have to apply fisher even if i dont know how) :

fit <- glmQLFit(list,design,robust = TRUE) #list is the DGEList
single.contrast <- makeContrasts(Tumor  - HC, levels = design)
test.anov <- glmQLFTest(fit,contrast = single.contrast)



Filtering data by logCPM > 3 and FDR < 0.001 #####

tagsTblANOVA <- topTags(test.anov, n=nrow(test.anov$table))$table
tagsTblANOVA.logcpm <- tagsTblANOVA[tagsTblANOVA$logCPM>3,]
tagsTblAnova.fdr <- tagsTblANOVA.logcpm[tagsTblANOVA.logcpm$FDR <0.001,]

column_ha = HeatmapAnnotation(condition = info$Condition)
DEGs <- rownames(tagsTblAnova.fdr) #filtered by FDR after fit and test.
cts <- cpm(list, log = TRUE)[rownames(list) %in% DEGs,] #list is the DGELIST obj
jpeg(file="h1.jpg")
Heatmap(matrix = t(scale(t(cts))),
           cluster_rows = TRUE,
           clustering_method_rows = "ward.D2",
           cluster_columns = TRUE,
           clustering_method_columns = "ward.D2",
           show_row_names = FALSE, 
           show_column_names = FALSE,
           column_title = "logcounts scaled",
    top_annotation = column_ha)
dev.off()

ADD REPLY • link 3.1 years ago by ali ▴ 20

1

Entering edit mode

Unless you do exactly the same code with exactly the same software versions and exactly the same everything, results will have nuance differences. Maybe they trimmed away some outliers from the heatmap color scale, can be many reasons. The message should be the same for the sake of reproducibility, that is what matters.

ADD REPLY • link 3.1 years ago by ATpoint 88k

0

Entering edit mode

thx before i close the post as solved, do you have any advise on how to implement fisher test on clustering into the Heatmap?

ADD REPLY • link 3.1 years ago by ali ▴ 20