Question

How to interpret Enrichr combined score 'infinity' value

0

Entering edit mode

4.0 years ago

srs204 • 0

I used Enrichr for the functional enrichment of my gene list. I ranked the GO terms based on their combined score since it is a combination of p-value and z-score. My questions:

Is combined score ranking better than p-value based ranking?
Is 'infinity' combined score meaningful?

I referred to another post to answer these questions: the best combine score for enrichment analysis from Enrichr. This post provided guidance on the formula for a combined score(cs). Given that cs= log(p-value) * z-score, if p-value is very close to zero, then log value can go to -ve infinity. Otherwise, z-score should become infinity.

enter image description here

But from my result, it is clear that the p-value is not too low to make the combined score go to infinity.

Can anyone provide an insight on whether the infinity combined score is meaningful? The GO term associated with infinity score is extremely biologically relevant in our case, but we want to justify this. Thanks!!

Enrichr Enrichment • 1.4k views

ADD COMMENT • link updated 6 months ago by mazegriff ▴ 100 • written 4.0 years ago by srs204 • 0

0

Entering edit mode

Hello,

Hope all is well. I thought I'd provide my thoughts, for others who may have the same questions.

Yes it seems. In "Enrichr: a comprehensive gene set enrichment analysis web server 2016 update," it appears the combined.score which is a composite score of p-value (assuming un-adjusted p-value) and a correction score or z-score of deviation from the expected rank performs better than the other ranking methods for computing enrichment (Kuleshov, et al., 2016). In terms of ranking performance relative to other ranking methods like the Proportion test (auc=0.066), the combined score has a larger positive deviation and higher AUC score (auc=0.086), demonstrating it better distinguishes between enriched and non-enriched terms.
I believe so. Mathematically, as you stated, it would require the values in the score's equation like p-value to approach 0 or the z-score to approach infinity. It seems the former is more likely, and, if p-value = 0, it would make the output infinity.

combined.score = ln(p_val)*z-score

Since you've confirmed this pathway enrichment is biologically reasonable and potentially corroborated with other pathway results, and if preprocessing is confirmed, it appears your results demonstrate an impactful experimental hypothesis. Nice.

On another note: In the recent "Gene Set Knowledge Discovery with Enrichr," the combined score is calculated as follows (Xie et al., 2021):

When clicking on the bars, the results are re-sorted by other ranking methods. There are three methods of ranking the results: The Fisher’s exact test, odds ratio, and a method that combines the two . . . The odds ratio ranking method is simply the odds ratio, while the combined score is a multiplication of the odds ratio by the negative natural log of the p-value. It provides a balance between these two methods of ranking.

There seems to be ambiguity about the definition of combined score--I will have to read more on this. Regardless, using Enrichr provides automatically calculated adjusted p-value (FDR-corrected p-values, and a measure of statistical significance of enrichment) and odds ratio (a measure of effect size, or magnitude/strength of enrichment), and they would probably be most effective if used together for pathway enrichment analysis, along with biological interpretation. For instance, if a KEGG pathway is enriched for differentially expressed genes (DEGs) with a high odds ratio (>2, let's say 10), and a very low adjusted p-value (< 0.01), this would suggest the following: the genes in this path are 10x more likely to be DEGs compared to genes that do not belong to the pathway; and this pathway enrichment is statistically significant, as the likelihood of observing the enrichment by chance (due to random gene distribution) is < 1%, respectively. This would likely be more meaningful than if a pathway has an adjusted p-value that's significant at <0.01, but now the OR is <2.

The combined.score provided by Enrichr (existing in two different forms in separate locations?) appears to provide an effective method to incorporate or at least extend beyond just p-value by leveraging a metric calculated from a p-value and z-score (measure of statistical deviation or how unusual the observed ranking is relative to a background expectation) or p-value and odds ratio (measure of enrichment strength in terms of likelihood or effect size). Despite the apparent ambiguity, the R package provides these statistics as columns for filtering and reducing the pathways to those most accurate and statistically supported.

Hope this is helpful.

Best, Maze

ADD REPLY • link 6 months ago by mazegriff ▴ 100