Best (general-purpose) practice for cutting off gene ontology annotation
2
1
Entering edit mode
23 months ago
Yep ▴ 20

Hi, I've collected annotations of a large number of genes from GO, and propagated annotations back to the GO ancestors. Now I am trying to use these annotations for a general-purpose genomic functional graph (similar to genomic KG, etc. that are recently developed). The problem is that those top-level terms (e.g., on levels 0 and 1) really don't make sense. However, I don't really know at which level I should be cutting off the GO terms... And there are also papers cutting off terms based on the number of genes associated with such terms.

Can someone recommend some papers or approaches that contain details on essentially trimming GO terms w.r.t GO level or #genes associated (whether before or after propagation of GOAs), etc.?

Maybe I should ask geneontologyhelp

gene-ontology go • 1.3k views
ADD COMMENT
1
Entering edit mode
23 months ago
Yep ▴ 20

Suggestion from GO staff:

I would suggest you use one of the GO subsets available http://geneontology.org/docs/download-ontology/

The 'Generic GO subset' is one that should be broadly useful for your case, giving enough granularity to map genes to meanful classes. Organism-specific subsets are also available.

ADD COMMENT
1
Entering edit mode
22 months ago

Another thing to keep in mind, the "level" of a go term has no meaning, so there is no guidance as to which level you would find most helpful in your work. Some of our highest level terms (and even ones quite a bit down) are intended for grouping purposes only, not as informative terms for annotation or enrichment: see GO:0005488 binding. You can identify these terms with the subset label "gocheck_do_not_annotate".

From our FAQ

GO terms do not occupy strict fixed levels in the hierarchy. Because GO is structured as a graph, terms would appear at different ‘levels’ if different paths were followed through the graph. This is especially true if one mixes the different relations used to connect terms.

A more informative metric would be the information content of the node based on annotations. See, for example, the work of Alterovitz et al..

ADD COMMENT

Login before adding your answer.

Traffic: 1980 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6