Question

Why Are Almost All Protein Or Mrna Abundance Measures Transformed By Log?

3

Entering edit mode

12.5 years ago

Zhilong Jia ★ 2.2k

Here, the abundance measure can represent concentration, molecular number or ppm, not in the chip-matter.

I don't know why to log them in the classification or regression model as y variable?

Could anybody help me? Thank you.

protein mrna statistics normalization • 4.5k views

ADD COMMENT • link updated 12.5 years ago by Woa ★ 2.9k • written 12.5 years ago by Zhilong Jia ★ 2.2k

0

Entering edit mode

What are the inputs and targets in the regression?

ADD REPLY • link 12.5 years ago by Qdjm 1.9k

0

Entering edit mode

the inputs are some features of protein or mRNA , and targets are protein abundance.

ADD REPLY • link 12.5 years ago by Zhilong Jia ★ 2.2k

0

Entering edit mode

If you log transform the target value, then you are assuming that a unit increase in your feature values gives a multiplicative increase in the target value. If you don't log transform them, then a unit increase in the feature value is additive. Which is the better assumption? Also, if you don't log transform, your regression weights are going to be largely determined by the most abundant transcripts. You have to decide what is appropriate based on your regression problem. (David's comments are also important to consider).

ADD REPLY • link 12.5 years ago by Qdjm 1.9k

0

Entering edit mode

"if you don't log transform, your regression weights are going to be largely determined by the most abundant transcripts"---I think this points are very important! Actually, I would analyse how much the features explain the y axis. Thank you.

ADD REPLY • link 12.5 years ago by Zhilong Jia ★ 2.2k

4

Entering edit mode

12.5 years ago

David Quigley 11k

One practical reason is so that increases and decreases are on the same scale. 8*2 = 16, an increase of 8, while 8/2 = 4, a decrease of 4. On the log scale, 3+1 = 4, while 3-1=2, a change of 1 unit in both cases. For details about the possible benefits of log-transformation on variance and outliers, read this long but simple tutorial answer written by a statistician.

ADD COMMENT • link 12.5 years ago by David Quigley 11k

0

Entering edit mode

when we concentrate on the diffidence before and after changes (such as over-expression comparing itself ), the log is useful. While for the regression or classification, just need to predict the abundance of protein, Is this transform suitable? In addition, clicking the URL http://www.childrensmercy.org/stats/model/log.aspx, it will jump to the main-page (http://www.childrensmercy.org/) . How strange! Could you send the text of that page to my e-mail box: zhilongjia@gmail.com. thank you very much.

ADD REPLY • link 12.5 years ago by Zhilong Jia ★ 2.2k

0

Entering edit mode

I think this is the original reference, see page 90 onwards http://people.stat.sfu.ca/~cschwarz/Stat-650/Notes/PDFbigbook-JMP/JMP-part003.pdf

ADD REPLY • link 12.5 years ago by Woa ★ 2.9k

0

Entering edit mode

from the pdf: 1. Is your data bounded below by zero? (Yes) 2. Is your data defined as a ratio? (No) 3. Is the largest value in your data more than three times larger than the smallest value? (Yes) Yes, I think I should log the y variable. In addition, it solves another question which log should I choose. Thank you for your material!

ADD REPLY • link 12.5 years ago by Zhilong Jia ★ 2.2k

score 3 · Accepted Answer · 2012-06-08

3

Entering edit mode

12.5 years ago

Woa ★ 2.9k

Apart from symmetry-fying the increase/decrease scale, I think taking logarithm makes the distribution gaussian(ish) for which you can do parametric tests, taking log also to some extent, takes care of heteroscedasticity (non-uniform mean dependent variance)

ADD COMMENT • link 12.5 years ago by Woa ★ 2.9k

0

Entering edit mode

Thank you for your answer. I make a regression, but when with no log , the squared R will be a little higher. So, which one should I use? I'm confused.

ADD REPLY • link 12.5 years ago by Zhilong Jia ★ 2.2k

0

Entering edit mode

The R2 is higher with the non-log version because of the higher range, that doesn't mean that you should be using log.

ADD REPLY • link 12.5 years ago by Qdjm 1.9k

0

Entering edit mode

Does this mean, with log, the range will be lower. "that doesn't mean that you should be using log"------that means I could use the no-log version?

ADD REPLY • link 12.5 years ago by Zhilong Jia ★ 2.2k