How to retrieve all GO terms at level 2 from a list input
4
0
Entering edit mode
5.8 years ago

Hi everybody,

I have a list of GO terms (not the GO ID, but the GO terms, like, "pathogenesis" or "intracellular organelle part") that I want to get the ancestral GO term at the level 2 ("biological process", "cellular component", respectvelly). Is there any easy script to do that?

GO • 3.5k views
ADD COMMENT
0
Entering edit mode

Hi all,

Thank you, Jean-Karim Heriche and Pierre Lindenbaum, for all the answers...

I have tried the solutions you guys told me to. But unfortunatelly I haven't been able to figure it out those solutions (I'm a beginner in programming). On the other hand I have found the API page in quickGO website (https://www.ebi.ac.uk/QuickGO/api/index.html#!/gene_ontology/findTermsCoreAttrUsingGET_1). I'm using jupyter notebook and python to retrieve the ancestor GO that I am interested (instead of through an GO term, I decided to use a GO_ID) from a specific GO_ID.

from a code like this:

import requests, sys

requestURL = "https://www.ebi.ac.uk/QuickGO/services/ontology/go/terms/GO%3A0048527/ancestors?relations=is_a%2Cpart_of%2Coccurs_in%2Cregulates"

r = requests.get(requestURL, headers={ "Accept" : "application/json"})

if not r.ok:
  r.raise_for_status()
  sys.exit()

responseBody = r.text
print(responseBody)

I got this response:

{"numberOfHits":1,"results":[{"id":"GO:0048527","isObsolete":false,"name":"lateral root development","definition":{"text":"The process whose specific outcome is the progression of the lateral root over time, from its formation to the mature structure. A lateral root is one formed from pericycle cells located on the xylem radius of the root, as opposed to the initiation of the main root from the embryo proper."},"ancestors":["GO:0008150","GO:0090696","GO:0007275","GO:0009791","GO:0022622","GO:0048856","GO:0048528","GO:0048527","GO:0048364","GO:0048731","GO:0099402","GO:0032502","GO:0032501"],"children":[{"id":"GO:1901333","relation":"positively_regulates"},{"id":"GO:0010102","relation":"part_of"},{"id":"GO:1901332","relation":"negatively_regulates"},{"id":"GO:2000023","relation":"regulates"},{"id":"GO:1902089","relation":"part_of"}],"aspect":"biological_process","usage":"Unrestricted"}],"pageInfo":null}

I wonder now how in python to access the last item of the ancestor key (GO:0032501) through a code. It seems that the reponse generates a unicode output instead of dictionary type. But I'm thinking creating a function, so I can find all the ancestors from my input list of GO_IDs. I've tried some things like make the unicode into a python dictionary, but I still can't access the 'ancestor' key.

ADD REPLY
1
Entering edit mode
5.8 years ago

You have several options:
- use owltools
- use go-perl
- use the Bioconductor package GO.db

ADD COMMENT
1
Entering edit mode
5.8 years ago

It seems like you may be overcomplicating this. If you just want to trace the terms back to Biological Process, Cellular Component, or Molecular Function, you could simply pull the Aspect (may also be abbreviated P,F,C depending on your sources) from the term info.

ADD COMMENT
1
Entering edit mode
5.4 years ago
dthorbur ★ 2.6k

Whilst a little late, I've been looking for something similar and found a way to do it in R. This is an adaptation from the code I found on the bioconductor support forums. This script gets all children of a given GO term, but it wouldn't be difficult to adapt to get ancestral terms instead.

To get all GO terms at level 4 that are child terms to the second level term GO:0002376 (Immune System Process), you would execute this;

library(GO.db)
library(org.Dr.eg.db) ## zebrafish annotation package from bioconductor

getAllBPChildren <- function(goids)
{
 ans <- unique(unlist(mget(goids, GOBPCHILDREN), use.names=FALSE))
 ans <- ans[!is.na(ans)]
}

level3_terms <- getAllBPChildren("GO:0002376")
level4_terms <- getAllBPChildren(level3_terms)

level4_genes <- mget(intersect(level4_terms, keys(org.Dr.eg.db)), org.Dr.eg.db)

Additionally, I'll add this because levels of GO terms can be misleading. Due to the nature of GO terms, "levels" can be fluid, where one term is a child to a level 3 and a level 4 terms, for example. So, to check the number of terms in both level 3 and 4, you can run this;

length(intersect(level3_terms, level4_terms))

If you wanted to get Molecular Function, just change the BP to MF.

ADD COMMENT
0
Entering edit mode
5.8 years ago

using mysql / ucsc

cat << EOF | mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D go
select 
    T1.acc,T1.name,
    T2.acc,T2.name,
    T3.acc,T3.name
from
    term as T1,
    term as T2,
    term as T3,
    term2term as X1,
    term2term as X2

where
    X1.term1_id = T2.id and
    X1.term2_id = T1.id and
    X2.term1_id = T3.id and
    X2.term2_id = T2.id and
    T1.name in ("pathogenesis")
EOF

.

+------------+--------------+------------+--------------------------------------------+------------+------------------------+
| acc        | name         | acc        | name                                       | acc        | name                   |
+------------+--------------+------------+--------------------------------------------+------------+------------------------+
| GO:0009405 | pathogenesis | GO:0044419 | interspecies interaction between organisms | GO:0051704 | multi-organism process |
+------------+--------------+------------+--------------------------------------------+------------+------------------------+
ADD COMMENT

Login before adding your answer.

Traffic: 2100 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6