Question

Hugely different results between edgeR and DESeq2

2

Entering edit mode

2.3 years ago

Gama313 ▴ 130

I am working on a 18-patients dataset. I am trying to calculate DE genes with both EdgeR and DESeq2 for further analyses.

Its a 2-factor design (Status, Fraction). I want to calculate DE genes of different Fractions using the Status as covariate. After DE analysis, logFC are comparable between edgeR and DESeq2, but significant genes (FDR<0.05) are different. EdgeR identifies 500 genes, DESeq2 found 1500. Among those 500 DE genes found with edgeR are ALSO DE with DESeq2. This is a huge problem, since edgeR/DESeq2 are supposed to be superimposable. Any suggestion?

edgeR code:

y <- DGEList(counts=counts,group=Fraction)
keep <- rowSums(y$counts) >= 10
y$counts <- y$counts[keep,]
y <- calcNormFactors(y, method="TMM")
design <- model.matrix( ~ Status + Fraction)
y <- estimateDisp(y,design)
fit <- glmQLFit(y,design)
qlf <- glmQLFTest(fit,coef=3)

DESeq2 code:

dds <- DESeqDataSetFromMatrix(countData = counts,colData=summary,design= ~ Status + Fraction)
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
dds <- DESeq(dds)
res <- results(dds)

edgeR RNASeq Expression Differential DESeq2 • 3.3k views

ADD COMMENT • link updated 2.3 years ago by Gordon Smyth ★ 7.9k • written 2.3 years ago by Gama313 ▴ 130

score 7 · Answer 1 · 2023-01-25

edgeR and DESeq generally regarded as having comparable performances on average, not to be superimposable. There are several ways in which edgeR and DESeq are different.

For starters:

They use different normalisation approaches by default
edgeR calculates p-values via the quasi-likelihood method, while DESeq uses Wald tests on the individual coefficients (by default, you can tell DESeq2 to use a more similar Likelhood Ratio Test).
DESeq2 automatically performs outlier removal by default, edgeR doesn't
DESeq2 automatically performs independent hypothesis filtering to find the optimal threshold at which to filter out lowly expressed genes. edgeR doesn't do this.

When people say that egdeR and DESeq2 are equivalent, they mean that when you average across a number of tests and benchmarks they come out more or less the same, not that they give the same answer on every test.

If I had to guess, I'd guess that the biggest difference you are seeing is coming from the final two of those factors I mentioned. You can test this by specifying

res <- results(dds, cooksCutoff=FALSE, independentFiltering=FALSE)

Its also possible to make DESeq2 use an LRT test rather than a Wald test, which I think will make the testing procedure closer to the one used by edgeR:

dds <- DESeq(dds, test="LRT", reduced= ~ Status)

This is not to say that making the DESeq results closer to edgeR results will make them "better" results, but rather, help you to understand the differences. You will then have to make a principled decision, based on your dataset, and the source of the differences, whether to trust the extra 1000 significant p-values provided by DESeq2.

score 7 · Answer 2 · 2023-01-27

DESeq2 is broadly analogous to edgeR's glm pipeline using glmLRT, which constructs genewise tests treating the estimated dispersions as known. You are however using edgeR's quasi-likelihood pipeline, which is a somewhat more conservative pipeline designed to give more rigorous FDR control than either edgeR-glm or DESeq2 by taking dispersion estimation uncertainty into account. On average, it will give slightly fewer significant DE genes that either of the other pipelines.

You might try setting robust=TRUE for glmQLFit which often will increase the number of DE genes.