Here, the abundance measure can represent concentration, molecular number or ppm, not in the chip-matter.
I don't know why to log them in the classification or regression model as y variable?
Could anybody help me? Thank you.
Here, the abundance measure can represent concentration, molecular number or ppm, not in the chip-matter.
I don't know why to log them in the classification or regression model as y variable?
Could anybody help me? Thank you.
Apart from symmetry-fying the increase/decrease scale, I think taking logarithm makes the distribution gaussian(ish) for which you can do parametric tests, taking log also to some extent, takes care of heteroscedasticity (non-uniform mean dependent variance)
One practical reason is so that increases and decreases are on the same scale. 8*2 = 16, an increase of 8, while 8/2 = 4, a decrease of 4. On the log scale, 3+1 = 4, while 3-1=2, a change of 1 unit in both cases. For details about the possible benefits of log-transformation on variance and outliers, read this long but simple tutorial answer written by a statistician.
when we concentrate on the diffidence before and after changes (such as over-expression comparing itself ), the log is useful. While for the regression or classification, just need to predict the abundance of protein, Is this transform suitable? In addition, clicking the URL http://www.childrensmercy.org/stats/model/log.aspx, it will jump to the main-page (http://www.childrensmercy.org/) . How strange! Could you send the text of that page to my e-mail box: zhilongjia@gmail.com. thank you very much.
I think this is the original reference, see page 90 onwards http://people.stat.sfu.ca/~cschwarz/Stat-650/Notes/PDFbigbook-JMP/JMP-part003.pdf
from the pdf: 1. Is your data bounded below by zero? (Yes) 2. Is your data defined as a ratio? (No) 3. Is the largest value in your data more than three times larger than the smallest value? (Yes) Yes, I think I should log the y variable. In addition, it solves another question which log should I choose. Thank you for your material!
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
What are the inputs and targets in the regression?
the inputs are some features of protein or mRNA , and targets are protein abundance.
If you log transform the target value, then you are assuming that a unit increase in your feature values gives a multiplicative increase in the target value. If you don't log transform them, then a unit increase in the feature value is additive. Which is the better assumption? Also, if you don't log transform, your regression weights are going to be largely determined by the most abundant transcripts. You have to decide what is appropriate based on your regression problem. (David's comments are also important to consider).
"if you don't log transform, your regression weights are going to be largely determined by the most abundant transcripts"---I think this points are very important! Actually, I would analyse how much the features explain the y axis. Thank you.