Question

Seeking a tool for enrichment of a small list of ranked genes

0

Entering edit mode

2 days ago

Aspire ▴ 360

Tools like enrichR perform can perform enrichment on a small unranked list of genes.

For gene lists which are the output of an experiment it is possible to supply not only the name of a gene, but also its p-value or another statistic to rank the genes by.

Is there a tool which enables to perform enrichment of a small (unsuitable for GSEA due to its size), ranked list of genes? That would enable to use more information as input, and thus get more precise results.

enrichment • 320 views

ADD COMMENT • link updated 1 day ago by Istvan Albert 101k • written 2 days ago by Aspire ▴ 360

0

Entering edit mode

What's preventing you from performing over-representation analysis (ORA) instead of GSEA? Even with GSEA you could adjust the minimum set size parameter down to single digit values and it would still work I believe.

ADD REPLY • link 2 days ago by Dunois ★ 2.8k

0

Entering edit mode

I am hoping to get more power than with standard ORA. I think that the input set for GSEA needs to be some thousands of genes (not ~50 genes as in my case)

ADD REPLY • link 2 days ago by Aspire ▴ 360

0

Entering edit mode

I am hoping to get more power than with standard ORA

You're very much limited at smaller list sizes because you have fewer ways to estimate whether the genes in your gene list are truly over-represented/enriched compared to if you were to take the whole population you are sampling from since things like permutation testing are not possible.

How small is your list and is the reason you are wanting a more powered analysis because you aren't getting meaningful results from a typical ORA approach?

ADD REPLY • link 2 days ago by yura.grabovska ▴ 660

0

Entering edit mode

The ORA approach does give sensible results (with around 50 genes). However I want to try and be more specific (get a better sense which cell type is characterized by change of this 50 genes).

Besides, I just wonder in general whether such an approach (taking a ranked list) exists. A typical ORA example does not take into account information that is often available (ranking the genes by a statistic). So, improving it could be of interest.

ADD REPLY • link 1 day ago by Aspire ▴ 360

0

Entering edit mode

If you wanted to get weird with it you could do something like 80% resampling of your 50 genes list and then run ORA and collect all the results and count how often all gensets are enriched. This stuff runs quick, so will probably take as long as an original java GSEA app to run. However, with a lot of things like GSEA, ORA etc I often find your reference geneset collecting can significantly influence your outcome so it's often worth dialing that in properly too

ADD REPLY • link 1 day ago by yura.grabovska ▴ 660

score 0 · Answer 1 · 2024-11-22

When you have small list of genes it makes less sense to use statistics.

In that case you should directly interpret the genes for what they are. No statistics are needed.

For example, a "list" containing a single gene that causes a specific phenotype would never be produces as "enriched" in any analysis.

I mentioned this maybe in a related post; I wrote this tool to visualize functional annotations for short (or large) lists of genes:

GeneScape: A Python package for gene ontology visualization

https://joss.theoj.org/papers/10.21105/joss.06624