How to scale raw Counts to obtain z-score Heatmap in R?
1
0
Entering edit mode
2.4 years ago
ali ▴ 20

Hi everybody, im trying to replicate this image from a paper (of which I have replicated all analysis except for this heatmap) :

enter image description here

Caption from the paper : Heatmap of unsupervised clustering of platelet mRNA profiles of healthy donors (red) and patients with cancer (gray). Paper link : https://www.cell.com/cancer-cell/fulltext/S1535-6108(15)00349-9

What I did is the following :

data <- cpm(counts,prior.count = 1,log=TRUE)
data.scaled <- scale(data,center = TRUE, scale = TRUE) #I tried all different scale types
data.scaled <- data.scaled[rownames(TopGenes),] #Choose just some genes so they are not all as the paper did
h <- pheatmap(normList.scaled[0:300,], cluster_rows=FALSE,name ="Z-score",
                cluster_cols=FALSE,annotation_col=info)#,annotation_col=info) #0:300 because all of them make my computer slow

And this is what comesenter image description here out :

Now the order of the samples is not important ( I did another order type ) , but i don't understand how they computed the z-scores. They didnt specify the steps so i guessed they first did the cpm and then scaled? But it doesn't look right.

GEO R • 3.4k views
ADD COMMENT
0
Entering edit mode

That looks like a ComplexHeatmap with a topAnnotation. CH clusters using hclust(dist(mat)) by default. You may want to try using CH instead of pheatmap.

ADD REPLY
0
Entering edit mode

I'd recommend emailing the authors and requesting the exact code used to generate the heatmap.

ADD REPLY
0
Entering edit mode

One of their supplemental files show row scaling (https://www.cell.com/cms/10.1016/j.ccell.2015.09.018/attachment/b9cc502c-88f0-4326-9322-67e22687c5e9/mmc1). Probably you can try that. Currently you are doing column scaling, Try row scaling.

ADD REPLY
0
Entering edit mode

Thank you for the reply, I tried row scaling too but still is completely different from their work. I have seen the supplemental files but is not said how to do it still. They just show another Heatmap saying that refers to the count per millions but I did it too (cpm and then scaling) but my values are way higher than 1.5 (like 10)

ADD REPLY
1
Entering edit mode
2.4 years ago
ATpoint 84k

See my answer here on how-to and why one scales data for heatmaps Scaling RNA-Seq data before clustering?

ADD COMMENT
0
Entering edit mode

Hey thanks for this link, it was really useful. The preprocessing step is the same I did but with filtering FDR < 0.001 because thats how they did it and logCPM > 3. I have retrived something like this :

enter image description here

It looks really similar now (just inverted , maybe the gene order is exactly my opposite idk). Could you explain to me why I do have 4 and -4 and they have 1.5 and -1.5? I read what you wrote on the other post and I got what z-score is but , if the data I'm using is the same , how is possible that the z-scores differ. Thats the code I use now (still have to apply fisher even if i dont know how) :

fit <- glmQLFit(list,design,robust = TRUE) #list is the DGEList
single.contrast <- makeContrasts(Tumor  - HC, levels = design)
test.anov <- glmQLFTest(fit,contrast = single.contrast)



Filtering data by logCPM > 3 and FDR < 0.001 #####

tagsTblANOVA <- topTags(test.anov, n=nrow(test.anov$table))$table
tagsTblANOVA.logcpm <- tagsTblANOVA[tagsTblANOVA$logCPM>3,]
tagsTblAnova.fdr <- tagsTblANOVA.logcpm[tagsTblANOVA.logcpm$FDR <0.001,]

column_ha = HeatmapAnnotation(condition = info$Condition)
DEGs <- rownames(tagsTblAnova.fdr) #filtered by FDR after fit and test.
cts <- cpm(list, log = TRUE)[rownames(list) %in% DEGs,] #list is the DGELIST obj
jpeg(file="h1.jpg")
Heatmap(matrix = t(scale(t(cts))),
           cluster_rows = TRUE,
           clustering_method_rows = "ward.D2",
           cluster_columns = TRUE,
           clustering_method_columns = "ward.D2",
           show_row_names = FALSE, 
           show_column_names = FALSE,
           column_title = "logcounts scaled",
    top_annotation = column_ha)
dev.off()
ADD REPLY
1
Entering edit mode

Unless you do exactly the same code with exactly the same software versions and exactly the same everything, results will have nuance differences. Maybe they trimmed away some outliers from the heatmap color scale, can be many reasons. The message should be the same for the sake of reproducibility, that is what matters.

ADD REPLY
0
Entering edit mode

thx before i close the post as solved, do you have any advise on how to implement fisher test on clustering into the Heatmap?

ADD REPLY

Login before adding your answer.

Traffic: 2596 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6