Question

Pointers To Learn About Functional Enrichment And Go Analysis

2

Entering edit mode

12.1 years ago

aravind ramesh ▴ 540

Can any one please provide links about research artcles or review articles or tutorials which are a bit comprehensive to learn about GO(Gene Ontology) analysis. Functional Enrichment and other types of enrichment process like Pathway enrichment and etc. Why should one go for those kind of analysis. How to analyse and understand Output of those analysis. Thanks in Advance

gene-ontology • 4.9k views

ADD COMMENT • link updated 12.1 years ago by Gjain 5.8k • written 12.1 years ago by aravind ramesh ▴ 540

score 15 · Answer 1 · 2013-03-11

Hi Aravind,

you can start with:

An Introduction to the Gene Ontology

The Gene Ontology project provides an ontology of defined terms representing gene product properties. The ontology covers three domains: cellular component, the parts of a cell or its extracellular environment; molecular function, the elemental activities of a gene product at the molecular level, such as binding or catalysis; and biological process, operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms.
Tools for Analysis of Data Sets, e.g. gene expression / microarray data

List of tools make use of the GO ontologies or the gene associations provided by Consortium members. Being listed on this page does not represent an endorsement by the GO Consortium, nor has the Consortium tested the tool or found that it uses the Consortium information accurately.
Gene Ontology: tool for the unification of biology [Nature 2000]

Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
A genome-wide association study of global gene expression[Nature 2007]

The most highly heritable traits were markedly enriched in Gene Ontology descriptors for response to unfolded protein (chaperonins and heat shock proteins), regulation of progression through the cell cycle, RNA processing, DNA repair, immune responses and apoptosis. SNPs that regulate expression of these genes are candidates in the study of degenerative diseases, malignancy, infection and inflammation. We have created a downloadable database to facilitate use of our findings in the mapping of complex disease loci.
Gene set enrichment analysis with topGO

The topGO package is designed to facilitate semi-automated enrichment analysis for Gene Ontology (GO) terms. The process consists of input of normalised gene expression measurements, gene-wise correlation or dierential expression analysis, enrichment analysis of GO terms, interpretation and visualisation of the results.
GORILLA: Gene Ontology enRIchment anaLysis and visuaLizAtion tool
GOrilla is an efficient GO analysis tool with unique features that make a useful addition to the existing repertoire of GO enrichment tools. GOrilla's unique features and advantages over other threshold free enrichment tools include rigorous statistics, fast running time and an effective graphical representation.
- Link to the paper: GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists
Gene Enrichment Analysis

This lecture introduces the notion of enrichment analysis, where one wishes to assign biological meaning to some group of genes. Whereas in the past each gene product was studied individually to assign it functions and roles in biological processes, there now exist tools that allow this process to be automated. By centralizing and disseminating a wealth of prior knowledge about known genes, the Gene Ontology [1] database allows researchers to assign attributes to groups of genes that emerge from their experiments or analyses. The initial group of genes may be some set that was clustered together through expression analysis, bound by the same transcription factor, or chosen based on prior knowledge. To identify larger patterns within this group is to seek enrichment - to assess whether some subset of the group shows signicant over-representation of some biological characteristic
EasyGO: Gene Ontology-based annotation and functional enrichment analysis tool for agronomical species

EasyGO is a good tool for helping biologists and agricultural scientists to discover enriched biological knowledge that can provide solutions or suggestions for original problems.
Gene Ontology analysis with Python and Bioconductor

The Gene Ontology (GO) project provides a standardized set of terms describing the molecular function of genes. We will use the topGO package from the Bioconductor project to identify over-represented GO terms from a set of differentially expressed genes. Python will be used to prepare the data, utilizing rpy2 to call R for the statistical analysis.
OntologyTraverser: an R package for GO analysis

Gene Ontology (GO) annotations have become a major tool for analysis of genome-scale experiments.We have created OntologyTraverser—an R package for GO analysis of gene lists. Our system is a major advance over previous work because (1) the system can be installed as an R package, (2) the system uses Java to instantiate the GO structure and the SJava system to integrate R and Java and (3) the system is also deployed as a publicly available web tool.

I hope this is a good place to start with.

score 3 · Answer 2 · 2013-03-11

These are helpful in assessing what functions (or pathways, or publications, or protein domains etc.) is overrepresented in a given group of genes - for example, if you've done a gene expression experiment, you might get hundreds or thousands of genes that are up- or down-regulated, and enrichment analysis is a tool to give you an idea of what those genes are and what they do, as well as give you some subsets of interesting genes to look at further. For example, in your thousands of differentially expressed genes, it might be that the set of 400 genes that have the GO term "nervous system development" are of particular interest to you.

Most enrichment analysis of this type uses a hypergeometric distribution to calculate an enrichment value. You can look up the details online if you're interested, but you don't need to know much about it unless you're programming one yourself - there are a bunch of different tools already available that will do this for you. The bit you need to understand is that the significance is calculated in a way that tells you how significantly overrepresented a particular function (or pathway, or publication, etc.) is in your list compared to a background population. Taking the "nervous system development" example, if 400 our of 500 genes are tagged with this term in your list, but 9,900 out of 10,000 genes in the whole genome are in general, this wouldn't be flagged as very significant, as it's quite a frequent term generally. However, if 400 out of 500 genes in your list are tagged with it, and only 450 out of 10,000 in the entire genome are, then that would be considered highly significant.

There are a range of different tools available that do this sort of analysis. For a quick preview of how it works, I'd recommend uploading a gene list to an InterMine database. They have different databases for different model organisms - I don't now what your favourite one is, but you can find all the links under 'Existing Mines' at intermine.org.