Question

Enrichment analysis software examples

2

Entering edit mode

5.8 years ago

s.may-wilson ▴ 60

I have recently began attempting to conduct an enrichment analysis with the intention of finding whether any pathways or biological functions are over-represented in a gene-set I have. Part of my analysis pipeline is creating two different models for predicting the expression levels of a specific list of genes. Thus, I have a subset of this gene list where one model significantly outperforms the other and I want to perform the enrichment analysis on this subset.

I've tried to use FUMA-GWAS (https://fuma.ctglab.nl/) and ConsensusPathDB (http://cpdb.molgen.mpg.de/) both of which seem like excellent tools for almost exactly what I'm trying to do with the issue that they both work by comparing my subset against all genes in humans (this is technically not the case with FUMA, but I have had some issues using it). This would be fine except that my initial gene list is not randomly chosen, so any enrichment analysis conducted against all genes might just show enrichment of my overall gene list rather than the subsetted list. Therefore I need to use a tool which allows the use of a background list of genes for my overall gene list.

Thus far I have drawn a blank on online resources which might allow this sort of analysis, for example DAVID (https://david.ncifcrf.gov/home.jsp) also does not seem to work with a background list of gene names. So my question is whether there are any other good online resources which might fit these requirements.

I'm aware of the existence of Cytoscape and GSEA, but was hoping there might be something simpler and easily accessible online than a downloaded software package.

enrichment analysis pathway gsea FUMA GWAS • 2.4k views

ADD COMMENT • link updated 5.8 years ago by Jean-Karim Heriche 27k • written 5.8 years ago by s.may-wilson ▴ 60

score 2 · Answer 1 · 2019-08-29

2

Entering edit mode

5.8 years ago

Charles Warden 8.3k

I would would recommend trying out Enrichr or GATHER:

https://amp.pharm.mssm.edu/Enrichr/

https://changlab.uth.tmc.edu/gather/

However, those don't specify a background set. While that often doesn't prevent getting useful results, (in addition to DAVID) you can specify a background set in IPA (but that is commercial software, and the links above are for free programs).

ADD COMMENT • link 5.8 years ago by Charles Warden 8.3k

1

Entering edit mode

EnrichR certainly looks rather nicely laid out and like it might be quite useful. GATHER doesn't look quite as fancy but also seems to provide some interesting information as well. Both seem like decent tools to add to what I've already tried, so thanks for the suggestions!

I think the main issue though is that without the background list, EnrichR in particular is returning very similar results to FUMA when conducted without a background list (not particularly surprising I suppose as they all rely on the same online resources to generate results). While useful, it doesn't solve the main issue that without the background list the results are a bit meaningless.

For example, both FUMA and EnrichR suggest that my subset of genes are enriched for immune response genes, but that is very plausibly because the original gene list was pretty enriched for genes related to immune response.

ADD REPLY • link 5.8 years ago by s.may-wilson ▴ 60

0

Entering edit mode

If you are something like an nCounter experiment (or targeted gene panel), then you have a good point.

While certain library preparation methods (or array designs) might also benefit from a background adjustment, I think these can be useful programs for hypothesis generation. Plus, certain programs like goseq can be automated for relatively quick results, but I usually find Enrichr / GATHER often provides better results than goseq (even though that other program is supposed to be more specifically designed for sequencing experiments).

For example, even with a limited set of genes, you might want to get an idea about some annotations for even 2-3 related genes (even though that can otherwise have a non-trivial false discovery rate). However, if all your tested genes are "immune" genes and a general category like "immune genes" is enriched, then you would either have to look for lower-ranked enrichment and/or use another program.

ADD REPLY • link 5.8 years ago by Charles Warden 8.3k

score 1 · Answer 2 · 2019-08-30

1

Entering edit mode

5.8 years ago

Jean-Karim Heriche 27k

You definitely need to use a custom background list that reflects your preselection of genes.
GSEA should allow you to create a custom gene set but you'll have to dig through the docs to find out how to do it. In R, the topGO package also allows you to work with a custom background gene set.
Another option is to implement overrepresentation tests yourself e.g. in R, most are based on the hypergeometric distribution.

ADD COMMENT • link 5.8 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Agreed. The analysis without the background list is relatively interesting, but not particularly meaningful.

I have to say I was hoping there would be some online resources I could use rather than having to run anything on R or download packages, but now I reckon it seems a bit unlikely as I couldn't find anything myself and the resources I could find all don't allow background lists.

Using topGO sounds like a decent solution. As well as that thought there has been a recent paper which shows how to carry out a series of functional enrichment tests (https://www.nature.com/articles/s41596-018-0103-9) so I might follow the pipeline they've created.

For the moment I intend to wait a little longer just in case I get lucky!

ADD REPLY • link 5.8 years ago by s.may-wilson ▴ 60