heatmap pattern for rna
1
0
Entering edit mode
4.6 years ago
Rob ▴ 170

Hello friends, I try many ways to plot a heatmap. I do not know why I don't get any pattern and my heatmap looks very messy with randomly spread up and down regulated genes. Please help me. I used winscaled z-score for this. this is the code:

rdata <- read.table("my_data.txt", header = TRUE, row.names = 1, stringsAsFactors = FALSE)
library(pheatmap)

# Create metadata
sample_org <- data.frame(row.names = colnames(rdata), c(rep("0", 66), rep("1", 66)))
colnames(sample_org) <- c("Group")
mat <- as(rdata, "matrix")


## Set colours
my_colour = list(
  Group = c("0" = "blue", "1" = "yellow"))

pheatmap(mat, symbreaks = FALSE,cluster_cols = FALSE,
         cluster_rows = TRUE, color = colorRampPalette(c("#f71616","#f71616","white",
                                                        "#1919d4", "#1919d4"))(100),annotation_col = sample_org, annotation_colors = my_colour)
rna-seq • 2.9k views
ADD COMMENT
0
Entering edit mode

Your heatmap reflects your data. If your data does not have a pattern, your heatmap won't. Do you have an expected pattern that you see in the data?

ADD REPLY
0
Entering edit mode

Yes, there should be a clear pattern between patients and normal people. Also differential gene expression analysis showed that these genes are significantly differentially expressed.

Is there any problem with clustering in my code?

ADD REPLY
0
Entering edit mode

Are you again plotting the raw data that are neither normalized nor log2-transformed as here? heatmap in R Where in the code do you Z-score the data? Did you normalize and log the data? What is winscaled?

ADD REPLY
0
Entering edit mode

I have HT-seq count data, and I calculated z-score of them (to normalize data), I tried this but I did not get pattern on heatmap. Then I winscaled the z-score (any expression value >3 changed to 3, any expression value <-3 changed to -3). I used this data (winscaled z-score) in heatmap. still no pattern. I did not do log

ADD REPLY
0
Entering edit mode

That doesn't sound like scaling, it just sounds like setting arbitrary min/max limits. Heatmaps generally always look better on log-normalized data. You can then set scale="row" in the pheatmap call to actually generate z-scores, which is what gives those nice blocky chunks that everybody wants.

ADD REPLY
0
Entering edit mode

So, you mean I do log2 transformation and then scale ="row" instead of using z-score of winscaled z-score?

ADD REPLY
5
Entering edit mode
4.6 years ago
ATpoint 88k

Ok, I think there are some things that have to be clearified:

1) Z-scoring is not a normalization in terms of compensating for the differences in library depth and composition. The Z-score simply indicates how much each sample of a count matrix deviates from the mean of all samples per gene. THis has the advantage that the scale for all genes is more or less the same. In contrast if you compare logFCs or read counts directly then very large values typically dominate the heatmap.

2) You must (this is not optional) first run a proper normalization before applying the Z-transformation. If you use DESeq2 you can simply use the output of either vst or rlog. These are already on log2-scale. Alternatively you can use the output of norm <- DESeq2::counts(dds, normalized=TRUE) and then put this to log2 scale like log2(norm+1).

3) Once you have the normalized values on log2 scale then apply the Z-scaling as Z <- t(scale(z(log2.values))). Z is then the input for your heatmap.

Without 2) your clustering is meaningless as the values are completely random due to lack of correction for library size differences. Without log-transformation the range of data (very low counts, very high counts) probably introduces quite some noise. Therefore usually you apply the Z-scaling to the log2-counts.

Please be sure to read existing threads, there are really many posts describing on how to create RNA-seq heatmaps here on Biostars.

ADD COMMENT
0
Entering edit mode

Thanks ATpoint for the comprehensive clarifying.

About normalization with vst or rlog and DESeq2::counts(dds, normalized=TRUE) are these functions in-built to Deseq2 so if I run differential expression analysis with Deseq2 my data will be normalized during the differential analysis and can go directly to heatmap? or should I do the normalization for ht-seq count seperately?

ADD REPLY
0
Entering edit mode

Separately, as said above. Run vst/rlog on your dds object and take the output for heatmaps. Also see https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#heatmap-of-the-count-matrix

I would use vst output, convert to Z, subset for the differential genes (or the genes you are interested in), and then plot it.

ADD REPLY
0
Entering edit mode

Hi I am trying to do log transformation with vst or rlog. both give me this error:

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘design<-’ for signature ‘"DESeqResults", "formula"

This is part of my code:

rdata <- read.table("data.txt", header = TRUE, row.names = 1, stringsAsFactors = FALSE)

library(pheatmap)
library(DESeq2)

## Differential abundance
alpha <- 0.05 #set the cutoff value
beta <- 2 
## Create metadata
sample_org <- data.frame(row.names = colnames(rdata), c(rep("0", 44), rep("1", 44)))
colnames(sample_org) <- c("Group")

dds <- DESeqDataSetFromMatrix(countData = rdata,
                              colData = sample_org,
                              design = ~Group)

dd <- DESeq(dds)

res <- results(dd)
#Saving results
write.csv(res,"res.csv")
#######################
rld <- rlog(res)   ################# this gives error

##########################
vsd <- vst(res)    ####################### this gives error
ADD REPLY
2
Entering edit mode

Just read the manual.

ADD REPLY

Login before adding your answer.

Traffic: 2442 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6