Question

heatmap pattern for rna

0

Entering edit mode

4.6 years ago

Rob ▴ 170

Hello friends, I try many ways to plot a heatmap. I do not know why I don't get any pattern and my heatmap looks very messy with randomly spread up and down regulated genes. Please help me. I used winscaled z-score for this. this is the code:

rdata <- read.table("my_data.txt", header = TRUE, row.names = 1, stringsAsFactors = FALSE)
library(pheatmap)

# Create metadata
sample_org <- data.frame(row.names = colnames(rdata), c(rep("0", 66), rep("1", 66)))
colnames(sample_org) <- c("Group")
mat <- as(rdata, "matrix")


## Set colours
my_colour = list(
  Group = c("0" = "blue", "1" = "yellow"))

pheatmap(mat, symbreaks = FALSE,cluster_cols = FALSE,
         cluster_rows = TRUE, color = colorRampPalette(c("#f71616","#f71616","white",
                                                        "#1919d4", "#1919d4"))(100),annotation_col = sample_org, annotation_colors = my_colour)

rna-seq • 2.9k views

ADD COMMENT • link updated 4.6 years ago by ATpoint 88k • written 4.6 years ago by Rob ▴ 170

0

Entering edit mode

Your heatmap reflects your data. If your data does not have a pattern, your heatmap won't. Do you have an expected pattern that you see in the data?

ADD REPLY • link 4.6 years ago by Ram 45k

0

Entering edit mode

Yes, there should be a clear pattern between patients and normal people. Also differential gene expression analysis showed that these genes are significantly differentially expressed.

Is there any problem with clustering in my code?

ADD REPLY • link 4.6 years ago by Rob ▴ 170

0

Entering edit mode

Are you again plotting the raw data that are neither normalized nor log2-transformed as here? heatmap in R Where in the code do you Z-score the data? Did you normalize and log the data? What is winscaled?

ADD REPLY • link 4.6 years ago by ATpoint 88k

0

Entering edit mode

I have HT-seq count data, and I calculated z-score of them (to normalize data), I tried this but I did not get pattern on heatmap. Then I winscaled the z-score (any expression value >3 changed to 3, any expression value <-3 changed to -3). I used this data (winscaled z-score) in heatmap. still no pattern. I did not do log

ADD REPLY • link 4.6 years ago by Rob ▴ 170

0

Entering edit mode

That doesn't sound like scaling, it just sounds like setting arbitrary min/max limits. Heatmaps generally always look better on log-normalized data. You can then set scale="row" in the pheatmap call to actually generate z-scores, which is what gives those nice blocky chunks that everybody wants.

ADD REPLY • link 4.6 years ago by jared.andrews07 ★ 18k

0

Entering edit mode

So, you mean I do log2 transformation and then scale ="row" instead of using z-score of winscaled z-score?

ADD REPLY • link 4.6 years ago by Rob ▴ 170

score 5 · Accepted Answer · 2020-10-06

5

Entering edit mode

4.6 years ago

ATpoint 88k

Ok, I think there are some things that have to be clearified:

1) Z-scoring is not a normalization in terms of compensating for the differences in library depth and composition. The Z-score simply indicates how much each sample of a count matrix deviates from the mean of all samples per gene. THis has the advantage that the scale for all genes is more or less the same. In contrast if you compare logFCs or read counts directly then very large values typically dominate the heatmap.

2) You must (this is not optional) first run a proper normalization before applying the Z-transformation. If you use DESeq2 you can simply use the output of either vst or rlog. These are already on log2-scale. Alternatively you can use the output of norm <- DESeq2::counts(dds, normalized=TRUE) and then put this to log2 scale like log2(norm+1).

3) Once you have the normalized values on log2 scale then apply the Z-scaling as Z <- t(scale(z(log2.values))). Z is then the input for your heatmap.

Without 2) your clustering is meaningless as the values are completely random due to lack of correction for library size differences. Without log-transformation the range of data (very low counts, very high counts) probably introduces quite some noise. Therefore usually you apply the Z-scaling to the log2-counts.

Please be sure to read existing threads, there are really many posts describing on how to create RNA-seq heatmaps here on Biostars.

ADD COMMENT • link 4.6 years ago by ATpoint 88k

0

Entering edit mode

Thanks ATpoint for the comprehensive clarifying.

About normalization with vst or rlog and DESeq2::counts(dds, normalized=TRUE) are these functions in-built to Deseq2 so if I run differential expression analysis with Deseq2 my data will be normalized during the differential analysis and can go directly to heatmap? or should I do the normalization for ht-seq count seperately?

ADD REPLY • link 4.6 years ago by Rob ▴ 170

0

Entering edit mode

Separately, as said above. Run vst/rlog on your dds object and take the output for heatmaps. Also see https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#heatmap-of-the-count-matrix

I would use vst output, convert to Z, subset for the differential genes (or the genes you are interested in), and then plot it.

ADD REPLY • link 4.6 years ago by ATpoint 88k

0

Entering edit mode

Hi I am trying to do log transformation with vst or rlog. both give me this error:

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘design<-’ for signature ‘"DESeqResults", "formula"

This is part of my code:

rdata <- read.table("data.txt", header = TRUE, row.names = 1, stringsAsFactors = FALSE)

library(pheatmap)
library(DESeq2)

## Differential abundance
alpha <- 0.05 #set the cutoff value
beta <- 2 
## Create metadata
sample_org <- data.frame(row.names = colnames(rdata), c(rep("0", 44), rep("1", 44)))
colnames(sample_org) <- c("Group")

dds <- DESeqDataSetFromMatrix(countData = rdata,
                              colData = sample_org,
                              design = ~Group)

dd <- DESeq(dds)

res <- results(dd)
#Saving results
write.csv(res,"res.csv")
#######################
rld <- rlog(res)   ################# this gives error

##########################
vsd <- vst(res)    ####################### this gives error