z score transformation by population or by gene?
1
0
Entering edit mode
5.5 years ago
Pietro ▴ 240

In calculating z-scores for microarray or RNA-Seq data, I have found two main answers on how to obtain them.

For example, in R, having a log2 expression matrix x with genes in rows and samples in columns, I would do:

zscore <- function(x) {
z <- (x - mean(x)) / sd(x)
return(z)
}

But many often suggest to use the scale base R function, on the transposed matrix. Like

mat_zscore <- t(scale(t(x)))

If I am not wrong, the two approaches are different, that is, in the first one I am subtracting population mean and dividing by population SD, while the second one operates by column by default, so transposing is done to calculate mean and SD for each gene in row.

My question is, is one of the two more correct than the other? And why are both given as valid alternatives?

Thanks

z score RNA-Seq microarray transformation • 8.2k views
ADD COMMENT
1
Entering edit mode
5.5 years ago

They should give the same values. Here is my proof, taking functions from pheatmap() and heatmap.2(), and comparing them to scale(): cannot replicate the pheatmap scale function

Keep in mind that we also either scale by row or by column. Your function is scaling by the global mean and global standard deviation. In a typical setting for a transcriptomics study, scale(t(x)) will scale by row.

Kevin

ADD COMMENT
0
Entering edit mode

My question was more like: "Is it better to scale by global or by gene mean and SD?"

ADD REPLY
0
Entering edit mode

Can you show an example where global mean and global sdev were used?

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Both answers in that thread are old, and the answers by Seán and dariober are different, as you have also highlighted in your question.

The scale() function will always scale by column, only (you can get it to scale by row by doing t(scale(t(x)))); so, each column in the data is scaled separately. This may be more favourable in certain situations, e.g., for visualisation. However, I have never seen a comprehensive review of why one would be more favourable over the other. You may receive a better answer by posting on Cross Validated.

ADD REPLY

Login before adding your answer.

Traffic: 1810 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6