Question

How to convert log2 scale RNA-Seq expression data to linear scale data

1

Entering edit mode

3.3 years ago

mohammedtoufiq91 ▴ 260

Hi,

We have run a pilot RNA-Seq study and I used edgeR package to obtain differential expression results. The results output a gene column along with the logCPM, logFC and p-value column. I have a question regarding conversion of log2 scale data to linear data for certain analysis. How can this be accomplished. Is there any R package to do this?

For instance, can I use the base function in R like below:

Linear FC value = 2^(logFC)

Linear CPM value = 2^(logCPM)

Thank you,

Toufiq

expression edgeR R rna-seq fold-change • 5.2k views

ADD COMMENT • link updated 3.3 years ago by rpolicastro 13k • written 3.3 years ago by mohammedtoufiq91 ▴ 260

2

Entering edit mode

That's correct, you can just exponentiate the values with base 2.

ADD REPLY • link 3.3 years ago by rpolicastro 13k

0

Entering edit mode

rpolicastro , thank you very much for the response. Another point, do I have to include -1 in the formula Linear FC value = 2^(logFC-1) if the log2 transformation would have been performed during logFC calculation as logFC = log2(matrix+1) as to avoid NAs.

ADD REPLY • link 3.3 years ago by mohammedtoufiq91 ▴ 260

0

Entering edit mode

No, you absolutely do not need to include -1 in the formula and it is quite wrong to do that. The logFC from edgeR are computed using a sophisticated generalized linear model algorithm that takes a lot of things into account. You should not make ad hoc hacks to change the results from edgeR.

ADD REPLY • link 3.3 years ago by Gordon Smyth ★ 7.7k

score 4 · Accepted Answer · 2021-08-22

All results output from edgeR are on a log-2 scale, so yes you can unlog the logFC and logCPM values using the formula you give (as rpolicastro already confirmed in his answer).

Let me say philosophically though that I don't agree that unlogged values can be described as "linear". Expression results are usually best analysed on a log scale and unlogged values do not behave in a linear fashion for most purposes.

Note it is especially important not to try to undo edgeR's logFC moderation, as mediated by the prior.count values, if you are working on an unlogged scale. If you really did want unmoderated fold-changes then you would simply set prior.count=0 in the call to exactTest instead of doing your own ad hoc hacking. I can't imagine how that could be a good idea however. Getting infinite fold-changes from tiny counts (0 vs 1 for example) does not help anyone.