Question

Different logFC (log2foldchange) values for genes from limma-voom and other tools (edgeR and DESeq2)

1

Entering edit mode

7.2 years ago

SMILE ▴ 190

I used different tools to do differently expressed genes analysis. From the results I see a general difference in logFC values for genes from limma-voom, edgeR and DESeq2. Although some people saw difference in logFC between edgeR and EDSeq2 and got good answers, in my case edgeR and DESeq2 get very simmilar results but quite different from limma-voom.

I found that (not sure, i saw it somewhere)

limma's logFC =mean(log2(Group1))-mean(log2(Group2))

Actual logFC= log2(mean(Group1)/mean(Group2))

Can this small difference in famulars cause a general big difference in limma-voom with other tools(edgeR and DESeq2) ?

Which logFC is better to trust?

Below is a figure of the comparison of different logFCs

enter image description here

RNA-Seq next-gen gene • 5.1k views

ADD COMMENT • link updated 7.2 years ago by Friederike 9.0k • written 7.2 years ago by SMILE ▴ 190

score 0 · Answer 1 · 2017-09-29

0

Entering edit mode

7.2 years ago

SMILE ▴ 190

Maybe I should put this in Bioconductor?

ADD COMMENT • link 7.2 years ago by SMILE ▴ 190

score 0 · Answer 2 · 2017-09-29

limma, edgeR and DESeq2 use different ways to estimate the read counts that they are using for the DE analysis as well as for the log2FC.

The basic steps of the DE tools are:

normalization for differences in sequencing depth
turning the very few read counts per gene per condition (usually only 2 or 3!) into values that will work with downstream statistical tests
apply the statistical tests

The tools have different solutions for all three steps, but while edgeR and DESeq2 are (nowadays) somewhat similar, limma uses the voom transformation for step 2, which actually aims at changing a lot of the annoying properties of read count data (their paper is very informative). Therefore, it is not too surprising that the logFC values differ since the values that each tool ends up using for the statistical test are not the actual raw read counts.

As far as what you can trust more -- there's no good answer to that. All three values are estimates that have been shaped by the specific assumptions the tools make. Personally, I would not be concerned as long as the trends are the same, and after a quick glance at your example above it seems like the direction is generally the same. Just make sure to (i) note which logFC value you're going with and (ii) to use this type of value for all your samples and comparisons.