I've seen the term been used in some papers working with gene expression data. I assume they refer to performing z-score normalization on the expression matrix, but I would like to know if this is the right interpretation. Also, is this typically done over each gene vector (rows of a traditional expression matrix) or over the samples? (columns). Another question I have is if it is always done one way or if it depends on the downstream analysis that we want to perform. For example, I've been encountering the term in co-expression papers, sometimes they also refer to this as "zero centering the expression matrix". What about if you want to do PCA, I think in R the function prcomp
by default performs the normalization on the columns, but could you in some situations do it over the rows before PCA?
Thank you once again for your answers Kevin, what still confuses me is that PCA does the transform column wise while the example you mentioned of scRNA-seq would do it row wise to transform each gene across all cells. Is it the same doing transform column or row wise? I would say no intuitively, but I understood that it is the same from your answer.
It would not be the same to scale row-wise or column-wise. However, note that when we use
prcomp()
, we virtually always supply the rotated (transposed) input data so that it is ultimately the genes that are scaled.but if for some analysis you wanted to do PCA with the samples as the features, would it be ok to do the z-score transformation row-wise (genes) and then again over the columns (samples) right before PCA? For example, some correction techniques have been tested for coexpression analysis in which you do PCA like this and then you regress gene expression with the loadings of the samples as the independent terms in the regression; you proceed to coexpression calculation afterwards. I've seen this in papers but it is not explained in detail if genes are standardized and then PCA is performed with scaling over columns additional to that or if its performed without scaling.
There is no right or wrong, and, technically, one does not have to standardise anything prior to performing PCA. Methods are almost always lacking in published works, too
Maybe I'm getting confused and scaling for PCA is just something done as part of the procedure and standardizing for expression matrices is something unrelated...