Clustering Go Terms?
7
7
Entering edit mode
13.8 years ago

Given a set of genes - does anybody have a simple suggestion for clustering such a set on the basis of GO terms (generally just interested in biological processes)?

I have a very stringently filtered data set and need a preliminary view of the types of biological processes represented in my reduced data set.

Thanks, D.

gene clustering • 17k views
ADD COMMENT
7
Entering edit mode
13.8 years ago

It looks like what you want to do is an enrichment analysis for GO terms. In our lab we have developed a tool that allows to do the enrichment analysis in few easy steps. It is called Gitools and you can find it at http://www.gitools.org.

First you would need to download the GO terms genesets (which you can do within the tool) and then run the enrichment with your set of genes and the previously downloaded genesets (or modules in Gitools nomenclature).

You can take a look to the tutorials available in the web to get started, furthermore don't hesitate to contact the authors for any doubt.

ADD COMMENT
1
Entering edit mode

Tutorials are now found here.

ADD REPLY
6
Entering edit mode
13.8 years ago
Marina Manrique ★ 1.3k

Have you thought about performing GOSlim analyses? Here you can find what GOSlim stands for "GO slims are cut-down versions of the GO ontologies containing a subset of the terms in the whole GO. They give a broad overview of the ontology content without the detail of the specific fine grained terms."

There are some GOSlims sets already defined (see link above) but you can always define your own set of GO terms to perform the GOSlim analysis.

In this kind of analysis you start with a set of GO terms and a set of selected terms that we'll call GOSlim set (for example). You then see (browsing the GO Graph) if each of the GO Terms is connected with any term of the GOSlim set. In other words, you translate all the GO terms you have initially into a set of selected (normally of interest) GO terms.

HTH. Marina

ADD COMMENT
0
Entering edit mode

I forgot to point out that first of all you need to know the GO Annotation for your set of genes

ADD REPLY
0
Entering edit mode

I'd like to show you this app to perform this GOSlim analysis in a user-friendly way http://blog.bio4j.com/?p=9 It's open source and freely available

ADD REPLY
6
Entering edit mode
13.8 years ago
Treylathe ▴ 950

DAVID http://david.abcc.ncifcrf.gov/ can do that, take a list of genes and cluster based on functional GO annotations. There are a lot of other tools there, and you can get quite fine tuned, but that might serve your purposes.

ADD COMMENT
4
Entering edit mode
13.8 years ago
Aswarren ▴ 60

If you actually want to cluster genes based on GO terms you need to calculate the semantic similarity between all pairs and then cluster them. I know you can do this with GOSim http://goo.gl/YvqlL (an R package), and with a little help from one of R's clustering algorithms. Also, the R package GOSemSim http://goo.gl/DXwBS might be useful though I have not used it. You also need to decide what semantic similarity metric to use http://goo.gl/fMQYS (though not all are implemented in those packages). To interpret the results of the clustering, or to just do enrichment analysis, I recommend using the Ontologizer http://goo.gl/6ejVG. It is flexible and allows you to specify the ontology, the population set, the study set, and the annotations themselves. As for the enrichment method I like MGSA http://goo.gl/T1NWl which is also implemented in the Ontologizer.

ADD COMMENT
2
Entering edit mode
13.8 years ago

Usually this is done the other way around, you cluster or sub-select genes by some condition then you look for GO enrichment within the groups. You could first try that on your group, pehaps use MEV to do it.

The main problem (and this may already be solved in some publications that I am not aware of) with clustering directly by GO terms is defining a similarity metric that would properly characterize any two GO terms. Intuitively that just does not seem possible over more distant GO terms.

ADD COMMENT
2
Entering edit mode
13.8 years ago

If what you want to do is indeed enrichment analysis for GO terms you might want to check [?]this question:[?] The GO_Elite approach that I mentioned there is more or less the opposite of the GOSlim approach as it finds the most distant leaves on the GO tree first. The other answers should be of interest as well.

ADD COMMENT
0
Entering edit mode

Couldn't edit my own (old) post. Wanted to add that a GO-Elite paper has now been published. It is at: http://dx.doi.org/10.1093/bioinformatics/bts366

ADD REPLY
1
Entering edit mode
13.8 years ago
Carl ▴ 80

Hi,

You might want to give a try to SimCT (http://tagc.univ-mrs.fr/SimCT/) which does exactly this: build a tree based on similarities of GO annotations for a set a genes.

c

ADD COMMENT

Login before adding your answer.

Traffic: 3086 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6