I have recently began attempting to conduct an enrichment analysis with the intention of finding whether any pathways or biological functions are over-represented in a gene-set I have. Part of my analysis pipeline is creating two different models for predicting the expression levels of a specific list of genes. Thus, I have a subset of this gene list where one model significantly outperforms the other and I want to perform the enrichment analysis on this subset.
I've tried to use FUMA-GWAS (https://fuma.ctglab.nl/) and ConsensusPathDB (http://cpdb.molgen.mpg.de/) both of which seem like excellent tools for almost exactly what I'm trying to do with the issue that they both work by comparing my subset against all genes in humans (this is technically not the case with FUMA, but I have had some issues using it). This would be fine except that my initial gene list is not randomly chosen, so any enrichment analysis conducted against all genes might just show enrichment of my overall gene list rather than the subsetted list. Therefore I need to use a tool which allows the use of a background list of genes for my overall gene list.
Thus far I have drawn a blank on online resources which might allow this sort of analysis, for example DAVID (https://david.ncifcrf.gov/home.jsp) also does not seem to work with a background list of gene names. So my question is whether there are any other good online resources which might fit these requirements.
I'm aware of the existence of Cytoscape and GSEA, but was hoping there might be something simpler and easily accessible online than a downloaded software package.
EnrichR certainly looks rather nicely laid out and like it might be quite useful. GATHER doesn't look quite as fancy but also seems to provide some interesting information as well. Both seem like decent tools to add to what I've already tried, so thanks for the suggestions!
I think the main issue though is that without the background list, EnrichR in particular is returning very similar results to FUMA when conducted without a background list (not particularly surprising I suppose as they all rely on the same online resources to generate results). While useful, it doesn't solve the main issue that without the background list the results are a bit meaningless.
For example, both FUMA and EnrichR suggest that my subset of genes are enriched for immune response genes, but that is very plausibly because the original gene list was pretty enriched for genes related to immune response.
If you are something like an nCounter experiment (or targeted gene panel), then you have a good point.
While certain library preparation methods (or array designs) might also benefit from a background adjustment, I think these can be useful programs for hypothesis generation. Plus, certain programs like goseq can be automated for relatively quick results, but I usually find Enrichr / GATHER often provides better results than goseq (even though that other program is supposed to be more specifically designed for sequencing experiments).
For example, even with a limited set of genes, you might want to get an idea about some annotations for even 2-3 related genes (even though that can otherwise have a non-trivial false discovery rate). However, if all your tested genes are "immune" genes and a general category like "immune genes" is enriched, then you would either have to look for lower-ranked enrichment and/or use another program.