Question

Why is DESeq a better method for finding highly upregulated and downregulated genes?

2

Entering edit mode

9.4 years ago

simonlab1 ▴ 20

My question is simple. Why is DESeq analysis for RNA-Seq reads considered to be a more reliable method for identifying upregulated/downregulated genes?

RNA-Seq deseq rpkm ChIP-Seq • 5.3k views

ADD COMMENT • link updated 2.2 years ago by Ram 44k • written 9.4 years ago by simonlab1 ▴ 20

16

Entering edit mode

because it has been shown that when you let an octopus decide which genes are significantly deregulated, it can not be reproduced as well as with DESeq [citation needed]

ADD REPLY • link 9.4 years ago by Ido Tamir 5.2k

0

Entering edit mode

Kudos for Octopus Joke :)

Glad that others gave answers to his question, I was also asking these kinda questions when I was new in bioinformatics

ADD REPLY • link 9.4 years ago by Manvendra Singh ★ 2.2k

2

Entering edit mode

I think what Ido is trying to say is, your question is lacking a second item to compare to, 'better' with respect to what?

btw.: I like the octopus predictor: there was once an octopus with very good results in prediction https://en.wikipedia.org/wiki/Paul_the_Octopus

ADD REPLY • link 9.4 years ago by Michael 55k

0

Entering edit mode

Fair point. Say you're comparing two different samples, and you're trying to screen for highly upregulated and highly downregulated genes. Would an RPKM ratio of Gene A in sample A and Gene A in sample B sort of analysis or a DESeq analysis that picks genes with lowest padj values be a more reliable method, and why?

Thanks!

ADD REPLY • link 9.4 years ago by simonlab1 ▴ 20

3

Entering edit mode

If your question ends up becoming, "why should I use a complicated method like DESeq2 or edgeR rather than just doing a T-test on RPKMs?", then have a read through those papers and also the paper on limma. The rationale is described in them.

ADD REPLY • link 9.4 years ago by Devon Ryan 105k

2

Entering edit mode

If no replicates, no method is better. The padj values do not make sense without replicates in your data.

ADD REPLY • link updated 2.2 years ago by Ram 44k • written 9.4 years ago by GouthamAtla 12k

0

Entering edit mode

I think that GFold works quite fine when replicates are not there

ADD REPLY • link 9.4 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

# Sorry for Spam #

Yes, I remember that Thomas Muller said somewhere that he wants to eat that Paul the Octopus :)

ADD REPLY • link updated 2.2 years ago by Ram 44k • written 9.4 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

I'm so glad I don't have to hear about "Orakel Krake Paul" every night on the news any more :)

ADD REPLY • link 9.4 years ago by Devon Ryan 105k

Ram · Answer 1 · 2015-08-05

I don't know if I would call it more reliable, but it does do additional calculations that other methods don't. For one, DESeq2 does something called "shrinking" fold changes of those genes that have low read counts. I don't pretend to understand the math behind it, but in general what it is doing is reducing the fold change of any gene that has low read counts in one or the other or both conditions. Genes with low read counts can have exaggerated fold changes. For example, imagine you have two conditions (each with 3 replicates). In the control for gene A the read counts are 1, 2 and 2, (average 1.67) and in the experiment the read counts are 4, 3 and 4 (average 3.67). Now you also have gene B with control values of 100, 200 and 200 and experimental values of 400, 300 and 400. The calculated fold change for both genes is going to be 2.19, and they may also be significant changes according to the adjusted p-value (I've checked, it happens). However, having a difference of 2 read counts on average is not a lot, and I would not call that differentially expressed unless it is really reproduced in a lot of replicates, thus, DESeq2 shrinks the fold change value accordingly.

I've compared DESeq2 to EdgeR, and while I like both methods, EdgeR does return many significant genes that have exaggerated fold changes due to low read counts (or zero read counts) whereas DESeq2 shrinks the fold change to where it is generally below my cutoff for differential expression. Generally, when I filter for differential expression I use both the padj value and the fold change value. Unless you have a lot of replicates, low fold changes may not be completely accurate. Thus, I use DESeq2 specifically because it adjusts the fold changes of genes with low read counts.

score 5 · Answer 2 · 2015-08-05

It's not likely that any method is better for all experiments, and methods can be evaluated across many metrics (accuracy in estimating effect size, control of FDR, sensitivity, robust, etc.). Just a few important ways in which even your standard, bulk RNA-seq experiment can differ:

number of biological replicates per group
number of groups
experimental design
batch effects
amount of within-group biological variability (big difference btwn controlled experiment vs study)
scale of the effect sizes (big or small diffs btwn groups)
proportion of genes/features which show differences btwn groups
presence of outliers
...

We like to remind users that, with very many replicates and exchangeable samples, rank tests or permutation tests are great because you don't have to make distributional assumptions. It's just that investigators often don't want to spend money on extra experiments when e.g. 3 or 5 replicates per group will suffice in finding the large effects, and allow them to examine more conditions.

With these differences in mind, I'd recommend looking for evaluations by 3rd parties.