Hi all!
I am a beginner in R. I am currently working on gene expression data (Nanostring). The data is normalized, house-keeping genes and low abundance have been removed and the data have been log2 transformed.
I have done DE analysis with limma.
Question: do I need to use scale(data) before running the analysis? I tried both scaling and without and the result are pretty similair but there are some differences. Same goes when I did PCA, the results rate almost same but not quite.
Main question: does one always have to use scale() on the data (even normalized, log2 transformed)? Can scale() ever ”damage” the results/data?
My take on it is that it's easier to interpret the beta values resulting from a linear model when they're scaled. In linear models it shouldn't affect the results.
The coefficients are comparable when standardized but their interpretation is not simpler because the relation to the original measurement unit is lost, e.g. interpretation of a coefficient on a weight scale is interpreted as how much a change in outcome is brought about by a change in 1 kg in the measured mass, the standardized version is interpreted as how much the outcome would change for a 1 standard deviation change in mass.
Yes, that makes sense.
Just a side comment, as I have processed many NanoString datasets: the approach that you're using is not incorrect; however, NanoString is a count-based method and the analysis may therefore be more conducive to the use of EdgeR or DESeq2, i.e., start from the raw NanoString counts and normalise these via EdgeR or DESeq2. One can even specify housekeeper genes, in this scenario. nSolver is also a [free] Windows-based GUI that can process NanoString data.