Before addressing your specific question, I would like to provide a short overview on Gene Ontology based enrichment analysis:
To perform biological enrichment analysis using ontologies you need following data:
- List of genes perturbed in an experiment (say microarray, next-gen
sequencing, proteomics etc)
- Background list of genes for your study (this could be list of genes
that you have used to derive the
perturbed genes from microarray, ngs,
proteomics etc. For example, list of
genes in a microarray, genes in a
given genome etc.)
- An ontology (in this case, gene_ontology)
- Gene Ontology Association file (In this file you can find GO terms from assigned to genes in lists mentioned in 1 and 2)
Note: there are several well-defined biological ontologies, but you may not find corresponding association data. For available list of GO association data see GOA
Enrichment analysis:
Enrichment calculations are classified into 3 categories by Huang et. al as singular enrichment analysis(SEA), gene set enrichment analysis (GSEA) and modular enrichment analysis (MEA). Basic difference between these three classes of enrichment algorithms are in the way the enrichment p-values are calculated.
In SEA-based approach, annotations terms of subset of genes are assessed one at a time against a list of background genes. An enrichment p-value is calculated by comparing the observed frequency of an annotation term with the frequency expected by chance and individual terms beyond the p-value cut-of (P-value ≤ 0.05). FunctAssociate and Onto-express are two SEA based enrichment analysis tools.
GSEA approaches are similar, but consider all genes during the enrichment analysis, instead of a pre-defined threshold based genes as in the SEA approach. GSEA from broad is an example of GSEA based tool.
MEA based programs like Ontologizer 2.0 and topGO use the relationship that exist between the annotations. These programs were reported to attain better sensitivity and specificity due to the consideration of GO term relationships.
These tools are based on similar Statistical / algorithmic concepts. See a review on 68 tools published in 2008 here, you can see minor-to-medium level differences in the way the nodes are treated, computation of the statistics etc. Statistical methods to derive P-value includes Fisher’s exact test, hypergeometric function, binomial test, χ2 test or combination of these methods.
You can use one of the R package / servers / command-line tools for performing such analysis. See the list of GO based tools compiled by AmiGO team here.
Now to your specific question:
Q: what are the criteria for ranking these categories? Are they based on p-values?
A: Yes. They are P-value based. See
section on SEA, GSEA and MEA for
various methods to derive the
P-value.
For a detailed overview of the concepts discussed in this answer see the following articles 1, 2, 3, 4, 5, 6
Thank you very much Khader for the explanation and literatures! They are very useful for me to understand GO analysis. I am interested in the statistical tests commonly used in those enrichment tools available, so a background of GO and related tools are definitely helpful for me. Thanks again!