Question

deseq2 or edgeR

3

Entering edit mode

5.2 years ago

yusuf ▴ 70

Hi,

I am using DESeq2 and edgeR for my analysis.

When I put cutoff of padj or FDR of 0.05 DESeq2 gives 650 DE and edgeR gives 490 DE genes But when I apply foldchange cutoff (1) also, DESeq2 give 0 DE genes and edgeR gives 352 genes.

Can anyone explain to me?

thanks in advance.

RNA-Seq deseq2 edger • 7.2k views

ADD COMMENT • link updated 2.4 years ago by adc0032 ▴ 20 • written 5.2 years ago by yusuf ▴ 70

1

Entering edit mode

Without showing the code you used it is impossible to reproduce if the difference in results is based on the conceptual differences between the tools or your choice of strategy/parameters. Please share representative code and explain also how the lowlevel processing was done (alignment/quantification etc.).

ADD REPLY • link 5.2 years ago by ATpoint 85k

0

Entering edit mode

count from txt file

for deseq2

dds <- DESeqDataSetFromTximport(count, meta, ~condition)
dds <- DESeq(dds)
res <- results(dds)

for edger

group <- factor(sample$condition)
yg <- calcNormFactors(yg)
design<- model.matrix(~group)
yg <- estimateDisp(yg,design)
fitrt <- glmFit(yg,design)
lrt <- glmLRT(fitrt,coef = ncol(design))

ADD REPLY • link 5.2 years ago by yusuf ▴ 70

0

Entering edit mode

if you want to use an treshold of adj.p of 0.05 in DESeq2 you should already use it in the result function.

res <- results(dds, alpha= 0.05)

if you want to only consider genes which changes in expression by plus or minus 50% you could additionally do this:

 res <- results(dds, alpha= 0.05 ,  lfcThreshold =  0.5855)

ADD REPLY • link 5.2 years ago by caggtaagtat ★ 1.9k

0

Entering edit mode

thanks for reply.

I was filtering it after exporting to csv file. I am considering 2 fold change, so I was taking lfc 1 and pvalue 0.05

what about edgeR.

ADD REPLY • link 5.2 years ago by yusuf ▴ 70

0

Entering edit mode

Did you use a foldchange cutoff at the end by filtering or did you generate the result table using the treshold? That should make a difference in DESeq2

ADD REPLY • link 5.2 years ago by caggtaagtat ★ 1.9k

0

Entering edit mode

after getting results I export it and then filter it by padj and logFC

ADD REPLY • link 5.2 years ago by yusuf ▴ 70

0

Entering edit mode

That's different. Use the tresholds in the result function

 res <- results(dds, alpha= 0.05 ,  lfcThreshold = 1)

ADD REPLY • link 5.2 years ago by caggtaagtat ★ 1.9k

score 4 · Answer 1 · 2019-09-16

4

Entering edit mode

5.2 years ago

kashiff007 ★ 1.9k

Check what method for ajusting the p-value both DESeq2 and edgeR program are using. P-adjust methods are described here (https://stat.ethz.ch/R-manual/R-devel/library/stats/html/p.adjust.html).

For the fold change, both of them uses log2 fold change by default but if you are using DESeq2 for estimating your expression from readcounts then it may differ from edgeR. For the comparison I would first estimate the expression value (most likely in TPM; why TPM is better is explained here https://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/), then use this for differential analysis with both program.

ADD COMMENT • link 5.2 years ago by kashiff007 ★ 1.9k

2

Entering edit mode

I would not use TPM for DESeq2, because they explicitly state so multiple times in the documentation

ADD REPLY • link 5.2 years ago by caggtaagtat ★ 1.9k

1

Entering edit mode

I am using raw counts

ADD REPLY • link 5.2 years ago by yusuf ▴ 70

score 4 · Answer 2 · 2019-09-16

Hi,

DESeq2 does Foldchange-shrinkage. In the manual they say this would be "useful for visualization and ranking of genes".

So basically, if I'm not mistaken, DESeq2 tests per default, whether the expression between to treatments (or other sample groups) is significantly different. If a gene has a certain gene expression, you would expect that in some biological replicate the expression is a little above the average and in another sample the expression is a little less. With thousands of biological replicates you would expect the expression values to be normally distributed around the average expression value.

This information, the average and the dispertion around the average, is needed in order to test, whether the same gene is significantly different expressed in another treatment (or other sample group). Therefore, DESeq2 estimates the dispertion of the theoretical distribution, which you would get with thousands of replicates. However, it can only use 3 expressen values most of the times due to money reasons. If you falsly estimates the dispertion to low, you could get sometimes false positives in the results. Thats why DESeq2 tries to avoid this by comparing the dispersion between genes of similar gene expression levels under the assumption that genes of the same expression show a similar amount of variation from the average expression.

So first DESeq2 estimates the dispertion for every gene. Subsequently, DESeq2 compares the dispertions of similar-strong expressed genes and shrinkes the estimated dispertion of every gene into the direction of the average dispertion of all genes with the same expression level.

score 4 · Answer 3 · 2019-09-18

4

Entering edit mode

5.2 years ago

Gordon Smyth ★ 7.6k

edgeR and DESeq2 both apply shrinkage to the logFCs, but the default shrinkage applied by edgeR is (deliberately) light touch whereas the DESeq2 shrinkage is heavier. So logFCs from edgeR will often be larger than those from DESeq2 and the result you see is to be expected.

This is an example of why applying a FC cutoff is a bad idea, especially if you don't know what the estimated logFC values mean. edgeR and DESeq2 are sophisticated programs that carefully assess DE for you. Applying your own very ad hoc rule over the top can't be a good idea!

If you really are worried about the size of the FCs, see the glmTreat function in edgeR, which will do a much better job that you're doing now.

Finally, you should add a filtering step to your edgeR pipeline, e.g., using filterByExpr before estimateDisp.

ADD COMMENT • link 5.2 years ago by Gordon Smyth ★ 7.6k

0

Entering edit mode

I know DESeq2 does its own filtering, but would it be a good idea in general to apply edgeR::filterByExpr to the raw counts data before passing them on to DESeq2::DESeq?

ADD REPLY • link 4.0 years ago by Dunois ★ 2.8k

0

Entering edit mode

You can do that technically-speaking but it is not necessary and also not explicitely recommended by the DESeq2 author.

ADD REPLY • link 4.0 years ago by ATpoint 85k

0

Entering edit mode

Wouldn't it make sense to do it for parity in the event of the counts being filtered elsewhere in the analysis?

Context: my data clusters very poorly on the PCA plots (replicates are all over the place), and I will (probably) use RUVSeq to try and account for this noise. In the RUVSeq vignette the counts data are filtered before being passed to the RUVx functions. So I presumed I should just filter the data that'll be fed to DESeq2 as well. Or can I use the additional variables estimated by RUVSeq using the filtered data on the full data set?

ADD REPLY • link 4.0 years ago by Dunois ★ 2.8k

0

Entering edit mode

If you have problems with unwanted variation then your problem is probably not solved by filtering some genes as those you'd filter would be lowly- or not expressed at all and therefore cannot be the drivers of your unwanted variation. I cannot comment without context. You can open a new question if you want, but try to follow the manuals first. If RUV recommends to filter to estimate the surrogates then do it.

ADD REPLY • link 4.0 years ago by ATpoint 85k

0

Entering edit mode

While i understand that applying ad hoc rules can be a bad idea, some biologists start their exploration at the largest differences between treatments (i.e., large logFC values).

As I understand, DESeq2 does not apply FC shrinkage by default anymore, so what could be driving the pattern similarly described in the OP when this is no longer factor? Could this be an effect of the stringency of dispersion shrinkage instead?

ADD REPLY • link 2.4 years ago by adc0032 ▴ 20