Intersection of multiple methods for RNA-Seq differential expression: conservative or crazy?
1
6
Entering edit mode
7.1 years ago
Adamc ▴ 680

Hi,

For a while now, I've been looking at the intersect of the significant results from DESeq2 and EdgeR as my standard for determining differentially expressed genes, treating EdgeR as a filter over the DESeq2 results. I usually report fold-changes from DESeq2 as the foldchange shrinkage when counts are low or highly variable is nice for not putting too much weight on results that are significant but low-confidence. I made the assumption that my considering of the p-values of both methods would result in a lower false-positive rate without sacrificing too much in terms of false negatives.

However, I was recently challenged on this assumption by a statistician, and must now consider the possibility that this approach increases the rate of false negatives to an unacceptable level, or conversely doesn't reduce the false positive rate enough to justify its application. We talked about the possibility of using the SeqQC dataset to actually run an analysis to figure this out. Before I get into doing that though, has anyone tackled this idea before? Is there any statistical treatment out there on the effect of stacking/intersecting different differential expression methods? I haven't come across anything of the sort, but that of course doesn't mean that I didn't miss something relevant.

Thanks!

RNA-Seq differential-expression • 3.0k views
ADD COMMENT
3
Entering edit mode

One of the issues you're going to run into is that these two packages are quite similar in how they work. It's mostly on the periphery of significance that you get discordant results. By taking the intersect you're effectively just taking a more stringent p-value threshold from either package and ignore the fact that you're increasing false negatives. If you really insist on intersecting two packages, then at least use limma with voom, so you could then argue that you're using genes found significant by two different statistical models.

BTW, there are innate problems to intersecting lists of significant genes. Supposing you chose a significance threshold of 0.05, you're then saying that if a gene has a p-value of 0.03 in one package and 0.05 in another that said gene will not be further considered. Does that really make even intuitive sense? It certainly doesn't make statistical sense.

ADD REPLY
0
Entering edit mode

It's mostly on the periphery of significance that you get discordant results

Yes, the original intent was to increase stringency. Although both packages use the negative binomial, distribution, I've occasionally seen cases where DESeq2 and EdgeR assigned fold-changes that went in opposite directions, even where both p-values were significant, which indicated something strange going on with a gene or the dataset that was then worth investigating (or filtered out).

I agree that perhaps it would have made more sense to simply apply a more stringent p-value threshold to one package, instead of intersecting two.

ADD REPLY
0
Entering edit mode

But you'd be missing those cases where the tools even disagree on the direction of the change. Are those generally cases where a more stringent p-value would have helped? I have no real insights (meaning: I'm too lazy to dig into this right now) into how closely connected the p-value calculation and the adjustment for the variance are. It's an interesting question though and might be worth posting it on the bioconductor help page in order to directly poke the authors of the respective packages and their thoughts on this.

ADD REPLY
2
Entering edit mode

Do you have a background in statistics that allows you to make these kind of assumptions beyond plain intuition? If not, I would strongly recommend not to do these kind of "experimental" analysis but to follow the outstandingly well-documented workflows of edgeR or DESeq2. The point is that, even though it is technically possible what you do, you probably did not do any validation of your results. This is a common problem in any bioinformatical analysis: Changing parameters from the default can dramatically alter the outcome, and without validation of the results, one should really stay with the defaults unless you have expert knowledge. Statistical methods to reduce false differential calls in microarrays/RNA-seq are under constant development/improvement for more than a decade, so I really do not think that a simple intersection can outperform this. If you want to add something new, better make use of additional methods for error correction, such as Michael Love's alpine package to correct GC bias, than using these homebrew methods.

ADD REPLY
3
Entering edit mode

When I started with analysis of RNA-Seq data there was no clear consensus on what analysis approach was going to become the "gold standard"- both DESeq and EdgeR were new-ish, and it was not like with Affymetrix microarrays where Limma had been the clear choice for years. Hence, this sort-of weak ensemble approach. Retrospectively, of course a more statistically sound technique such as statistically combining p-values would make more sense. Also we always do validation with qPCR on selected genes from a range of foldchanges/p-values to confirm that the qPCR and RNA-Seq results are strongly correlated., although I never intentionally selected genes which were significant by only one of the DE approaches.

What I'm trying to establish now is how much "damage" was or could have been done by using this sort of naive approach- and if this has been adequately answered or addressed somewhere already. I've heard that this sort of intersection thing is not uncommon, and so it seems that a conclusion on this matter could be useful for the community.

ADD REPLY
0
Entering edit mode

Isn't there any conclusion on this topic yet?

ADD REPLY
4
Entering edit mode
7.1 years ago

Very interesting question, which I have not seen explored extensively.

I cannot contribute any statistical insights, but from a practical standpoint I just wanted to add that we also often check the results of DESeq2 with limma-voom and edgeR because we've seen that the overlap tends to be a good indicator for whether we should worry and go back to look at the data more closely or whether we can move ahead. If the overlap is abysmal (especially if edgeR and DESeq2 disagree on a large number of results), that is usually an indication that something funky is going on, which brings out the worst (or the best?) of each method. If the agreement is good, it doesn't really matter which program you go with.

ADD COMMENT

Login before adding your answer.

Traffic: 1938 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6