'How To' On Gene Ontology Analysis
4
9
Entering edit mode
13.1 years ago
Sheila ▴ 280

Hi All,

I am trying to make a 'HOW TO' on gene ontology analysis. If you know answer of these questions please help. I hope it will be very useful to all people who are new to gene ontology analysis and bioinformatic (like me :) Please post answers in one specific tool/language (preferable in R,python,perl)

Give a GO ID (e.g. GO:0090342) How to:

  1. find all its children up-to a specific depth. For example find up-to 4th level
  2. find all its parents and grand parents up-to a specific height. For example up-to 4th level
  3. the name (definition) of the GO ID
  4. draw the tree (with name (definition) & GO ID)
  5. find all the directly associated gene/protein with specific GO ID
  6. find all gene/protein associated with specific GO ID and its children (up-to level n)

Given a gene/protein ID/name (e.g. UniProtKB ID) How to:

  1. find all the associated GO IDs with specific type (e.g. all GO IDs associated with UniProtKB ID which are related to 'biological process' )
  2. remove all over-represented GO IDs (from the result of last query). (e.g. GO:0090342 and GO:0050793 both are associated with p53 but as GO:0050793 is child of GO:0090342 so I want to remove GO:0090342 from the data and want to keep only GO:0050793)
  3. find whether a specific GO ID is associated with a given UniProtKB ID or not?

(Some questions may be repeat/extension of previous questions, still its good to have a direct answer)

gene r python ontology • 9.0k views
ADD COMMENT
2
Entering edit mode
13.1 years ago

All of your questions can be answered by querying the GO database using GOOSE, the GO Online SQL Environment http://berkeleybop.org/goose, or AmiGO, the search/browse tool provided by the Gene Ontology at http://amigo.geneontology.org. Both tools have help documentation and there is a substantial list of database queries for GOOSE that includes a number of your questions above.

ADD COMMENT
2
Entering edit mode
13.1 years ago
Guangchuang Yu ★ 2.6k

1 . you can define a function like the following:

getGOLevel <- function(Node="GO:0003674", Children=GOMFCHILDREN, level) {

for (i in seq_len(level-1)) {
    Node <- mget(Node, Children, ifnotfound=NA)
    Node <- unique(unlist(Node))
    Node <- as.vector(Node)
    Node <- Node[!is.na(Node)]
}
return(Node)

}

this function was modified from getGOLevel defined in my package clusterProfiler.

require(GO.db)

getGOLevel(Node="GO:0090342", Children=GOBPCHILDREN,level=2)

[1] "GO:0090343" "GO:0090344" "GO:2000772"

2 . The answer to this question should be the same as question 1.

The function was modified to:

getGOLevel <- function(Node="GO:0090342", Parent=GOBPPARENTS, level) {

for (i in seq_len(level-1)) {
    Node <- mget(Node, Parent, ifnotfound=NA)
    Node <- unique(unlist(Node))
    Node <- as.vector(Node)
    Node <- Node[!is.na(Node)]
}
return(Node)

}

we can test it by:

getGOLevel(Node="GO:0090342", Parent=GOBPPARENTS, level=4)

[1] "GO:0032502" "GO:0008150" "GO:0065007"

3 . This question can be directly answered by the function GO2Term defined in my package clusterProfiler.

clusterProfiler:::GO2Term("GO:0090342")

           GO:0090342

"regulation of cell aging"

4 . I am not familiar with drawing GO tree.

5 . For human, can use the following command:

mget(GOID, org.Hs.egGO, ifnotfound=NA)

6 . This can also be directly answered by the function getGO2ExtID defined in my package clusterProfiler as shown below.

clusterProfiler:::getGO2ExtID("GO:0090342", organism="human")

$GO:0090342

[1] "1029" "2305" "3159" "4000" "4282" "5728" "7471" "8091" "9891"

[10] "10783" "51343" "54708" "87178"

ADD COMMENT
1
Entering edit mode
7.3 years ago
gil.hornung ▴ 100

Regarding the first two items in you list,

I just found the R library GO.db

You can use the following functions:

  • GOxxPARENTS: the parents of the term
  • GOxxANCESTOR: the parents, and all their parents and so on.
  • GOxxCHILDREN: the children of the term
  • GOxxOFFSPRING: the children, their children and so on out to the leaves of the GO graph.

The xx should be replaced by BP, MF, or CC, based on the type of ontology (Biological Process, Molecular Function, Cellular Component)

For example, finding the children of the Cellular Component GO:0005886 plasma membrane:

library(GO.db)
GOCCCHILDREN$"GO:0005886"

You can loop over all results and find their children as many times you want.

ADD COMMENT
0
Entering edit mode
13.1 years ago
Cshao • 0

All of the questions can be answered by any well known programming language (Java, C/C++, Python, Perl ...), they are basically the same (plain text parsing) -- Download GO file (http://www.geneontology.org/GO.downloads.ontology.shtml) with your favorite format and make a parse.

In my opinion, if you plan to do many works on GO, it is better to use programming language rather than using specific tool.

ADD COMMENT

Login before adding your answer.

Traffic: 2475 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6