Question

What is the better transformation we should use to transform the expression level?

0

Entering edit mode

10.0 years ago

M K ▴ 660

Hi All,

I have expression read count, and there are many zeros in this data. so what what is the better transformation we should use to transform the expression level data. I found some people used log10 and other used log2 transformations.

next-gen RNA-Seq R • 7.5k views

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 10.0 years ago by M K ▴ 660

Ram · Answer 1 · 2014-12-08

1

Entering edit mode

10.0 years ago

Manvendra Singh ★ 2.2k

Use DESeq2, It handle zero value to data, and would make log2 transformations as well, using log10 would not be good idea

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 10.0 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

Could you please tell me why log2 is better that log10

ADD REPLY • link 10.0 years ago by M K ▴ 660

1

Entering edit mode

Its not about better and worse, Its just the way to express your values.

e.g. if your gene is 8 fold upregulated then log2 would be 3 and log10 would be 0.9

if you have majority of genes having differential expression value, more than 10 (which is very less likely), then represent them with log10 so that values would be displayed from the scale 1

ADD REPLY • link 10.0 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

this is a small portion of the read count expression that I have before using transformation, so is it okay to use log2 for it.

ADD REPLY • link updated 2.7 years ago by Ram 44k • written 10.0 years ago by M K ▴ 660

0

Entering edit mode

yes, DESeq2 would do it in its pipeline, before comparing, if you are looking for DEGs,

low counts would be discarded

ADD REPLY • link updated 2.7 years ago by Ram 44k • written 10.0 years ago by Manvendra Singh ★ 2.2k

3

Entering edit mode

Just in case someone comes across this later and wants to know more about this than a biologist probably ever should:

It's not that DESeq2 converts the counts to log2 scale, but rather that it fits the data with a model using a log2 link (this is the case for many many many tools). Why? A couple reasons really. Firstly, it makes the math much easier. For example, when not using a log2 link, coefficients multiply, meaning they can quickly get very large or very small. This can quickly lead to loss of precision. This is especially the case as a coefficient approaches 0, since no one uses infinite precision math for anything that needs to be quick. On the log2 scale the coefficients will simply sum. Further, the range on the log2 scale is changed from [0, infinity] to [-infinity, infinity]. This is convenient for optimization (the class of functions that tend to actually be used to perform the maximization (actually, minimization, but that's a different post...) in "maximum likelihood expectation"). You can do bounded optimization, but it's simpler to have an infinite range.

BTW, these are the same reasons we do logistic (or probit) regression. A logit converts the range [0, 1] to [-infinity, +infinity] and has all of the other benefits.

ADD REPLY • link 10.0 years ago by Devon Ryan 104k

0

Entering edit mode

Its an Answer Devon , very good answer :)

ADD REPLY • link 10.0 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

Actually, I'm making yours an answer and up-voting it :) Mine is more of an overly long aside!

ADD REPLY • link 10.0 years ago by Devon Ryan 104k

0

Entering edit mode

Note : Its offtopic comment :)

I like the way you help here and on SeqAnswers Devon (y) , you in Bonn, me in Berlin, hope we'd see someday :)

ADD REPLY • link 10.0 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

It's not that big of a country, so the odds of bumping into each other at some point is pretty high!

ADD REPLY • link 10.0 years ago by Devon Ryan 104k