Hello friends,
I try many ways to plot a heatmap. I do not know why I don't get any pattern and my heatmap looks very messy with randomly spread up and down regulated genes.
Please help me. I used winscaled z-score for this.
this is the code:
Yes, there should be a clear pattern between patients and normal people. Also differential gene expression analysis showed that these genes are significantly differentially expressed.
Are you again plotting the raw data that are neither normalized nor log2-transformed as here? heatmap in R
Where in the code do you Z-score the data? Did you normalize and log the data? What is winscaled?
I have HT-seq count data, and I calculated z-score of them (to normalize data), I tried this but I did not get pattern on heatmap. Then I winscaled the z-score (any expression value >3 changed to 3, any expression value <-3 changed to -3). I used this data (winscaled z-score) in heatmap. still no pattern. I did not do log
That doesn't sound like scaling, it just sounds like setting arbitrary min/max limits. Heatmaps generally always look better on log-normalized data. You can then set scale="row" in the pheatmap call to actually generate z-scores, which is what gives those nice blocky chunks that everybody wants.
Ok, I think there are some things that have to be clearified:
1) Z-scoring is not a normalization in terms of compensating for the differences in library depth and composition.
The Z-score simply indicates how much each sample of a count matrix deviates from the mean of all samples per gene. THis has the advantage that the scale for all genes is more or less the same. In contrast if you compare logFCs or read counts directly then very large values typically dominate the heatmap.
2) You must (this is not optional) first run a proper normalization before applying the Z-transformation. If you use DESeq2 you can simply use the output of either vst or rlog. These are already on log2-scale. Alternatively you can use the output of norm <- DESeq2::counts(dds, normalized=TRUE) and then put this to log2 scale like log2(norm+1).
3) Once you have the normalized values on log2 scale then apply the Z-scaling as Z <- t(scale(z(log2.values))).
Z is then the input for your heatmap.
Without 2) your clustering is meaningless as the values are completely random due to lack of correction for library size differences. Without log-transformation the range of data (very low counts, very high counts) probably introduces quite some noise. Therefore usually you apply the Z-scaling to the log2-counts.
Please be sure to read existing threads, there are really many posts describing on how to create RNA-seq heatmaps here on Biostars.
About normalization with vst or rlog and DESeq2::counts(dds, normalized=TRUE) are these functions in-built to Deseq2 so if I run differential expression analysis with Deseq2 my data will be normalized during the differential analysis and can go directly to heatmap? or should I do the normalization for ht-seq count seperately?
Your heatmap reflects your data. If your data does not have a pattern, your heatmap won't. Do you have an expected pattern that you see in the data?
Yes, there should be a clear pattern between patients and normal people. Also differential gene expression analysis showed that these genes are significantly differentially expressed.
Is there any problem with clustering in my code?
Are you again plotting the raw data that are neither normalized nor log2-transformed as here? heatmap in R Where in the code do you Z-score the data? Did you normalize and log the data? What is
winscaled
?I have HT-seq count data, and I calculated z-score of them (to normalize data), I tried this but I did not get pattern on heatmap. Then I winscaled the z-score (any expression value >3 changed to 3, any expression value <-3 changed to -3). I used this data (winscaled z-score) in heatmap. still no pattern. I did not do log
That doesn't sound like scaling, it just sounds like setting arbitrary min/max limits. Heatmaps generally always look better on log-normalized data. You can then set
scale="row"
in the pheatmap call to actually generate z-scores, which is what gives those nice blocky chunks that everybody wants.So, you mean I do log2 transformation and then scale ="row" instead of using z-score of winscaled z-score?