Is it acceptable to row scale VST transformed data for heatmaps?
1
2
Entering edit mode
5.5 years ago

I have Variance Stabilized Transformed count data and I'm making heatmaps. I believe the scale() function in R does a log2 scaling and then centers the data via a Z-score. If the counts are already transformed via VST, is it acceptable to then scale the matrix rows again?

RNA-Seq heatmap variance stabilized transformation • 6.0k views
ADD COMMENT
2
Entering edit mode

The base R scale() function doesn't do log transformation and only standardizes variables for a specific setting of its parameters. Read the docs of the scale function you intend to use. Heatmaps are visualization tools so any transformation that helps highlight and interpret structure in the data should be OK. The z-score indicates how many standard deviations from the mean a value is. Without variance stabilization, z-scores would be useless because the variance would vary with the mean.

ADD REPLY
2
Entering edit mode

t( scale (t (your.matrix_OR_data.frame))) is what you need for row-wise z-scoring given you did log2-transformation or any of the recommended normalization techniques for clustering/ML applications.

ADD REPLY
4
Entering edit mode
5.5 years ago
Simon Anders ▴ 50

"Scaling" means to multiply with a scaling factor. However, after doing a VST or other log-like transformation, you should no longer multiply/divide, but rather add/subtract.

The typical thing done for heatmaps of log- or VS-transformed data is to subtract for each gene the gene's mean (taken over all samples). Then, use a divergent colour scale (e.g., red-white-blue with red for negative, white centered on zero and blue for positive), and the heatmap will show you, for each gene, in which samples the gene is expressed below its average (red) and where it is expressed above its average (blue).

ADD COMMENT
1
Entering edit mode

At least heatmap.2 and pheatmap are still performing the extra division by standard deviation, though, i.e., after they have mean-centered the data. The result is the exact same as running t(scale(t(x))) on the data. Here is the proof: A: cannot replicate the pheatmap scale function

ADD REPLY
0
Entering edit mode

But they don't scale by default, right? You have to chose that option. And I'd usually suggest to only center to mean 0 but not to scale to SD 1, so that you can still see which genes change a lot and which only fluctuate with a bit of noise.

ADD REPLY
0
Entering edit mode

Yes, not default (just checked to confirm).

ADD REPLY
0
Entering edit mode

Hi Simon,

If I understand you correctly, it is not correct to calculate z-score after log-transformed data since z-score is also divided by SD, right? But a lot of people think it is good to log-transform the data before z-score.

Besides, after I do the VST, I compare the heatmap of using scale="row" and just center to mean 0 but not to SD by using pheatmap. The one with scale="row" shows more variance of genes between samples comparing to the one just center to mean. And it is exactly what I would expect to see. But I am worried that I will over-interpret the results.

ADD REPLY
0
Entering edit mode

Is this effectively the same as using base::scale() to center but not scale across rows? e.g. t(scale(t(mat), scale = FALSE, center = TRUE)). I find this is nice for heatmaps where there are few or no counts outliers, because it better preserves the relative count information between genes as well as allowing for some sample to sample comparisons.

ADD REPLY

Login before adding your answer.

Traffic: 3232 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6