How to look up GO terms associated to a certain organism?
2
1
Entering edit mode
7.9 years ago
Diego ▴ 50

I want to get all the GO terms under the biological process GO domain for any organism -right now, I need all the biological process associated to all gene products of E. coli (non-IEA ideally).

I'm new in the field of bioinformatics (I'm a software engineer by education) and right now I'm struggling with the great number of interconnected databases I'm finding.

At first, I tried with the R library biomaRt[1], but I discovered they don't support the Ensembl Bacteria database anymore.

Then I tried to do it by hand using AmiGO (actually GOlr[2]), but then I noticed that GO Central focused on human health-related annotations[3], so I supposed I would loose a lot of useful annotations for non-human species. After that I wanted to try QuickGO, but then I thought that with so many databases, repositories and tools around, it's a bit hard to know how exhaustive my request will be.

Should I stick with Ensemble databases and ditch biomaRt as a library? Or using a combination of GOlr(AmiGO) and GAnnotation(QuickGO) enough?

References

gene ontology gene product annotation biomaRt • 7.3k views
ADD COMMENT
8
Entering edit mode
7.9 years ago
EagleEye 7.6k

Check out Gene Set Clustering based on Functional annotation (GeneSCF)

Please use recent version of GeneSCF for successful results.

You can extract all GO terms from gene ontology for your desired organism (Example below, Escherichia coli [ecocyc]) using simple command from GeneSCF,

(Complete GO for E.coli)

./prepare_database -db=GO_all -org=ecocyc

OR (Biological Process)

./prepare_database -db=GO_BP -org=ecocyc

OR (Molecular Functions)

./prepare_database -db=GO_MF -org=ecocyc

OR (Cellular Components)

./prepare_database -db=GO_CC -org=ecocyc

Advantages

  • Real-time analysis, do not have to depend on enrichment tools to get updated.

  • Easy for computational biologists to integrate this simple tool with their NGS pipeline.

  • GeneSCF supports more organisms.

  • Enrichment analysis for Multiple gene list in single run.

  • Enrichment analysis for Multiple gene list using Multiple source database (GO,KEGG, REACTOME and NCG) in single run.

  • Download complete GO terms/Pathways/Functions with associated genes as simple table format in a plain text file (Check "Two step process" below in "GeneSCF USAGE" section).

ADD COMMENT
0
Entering edit mode

You have an awesome tool right there, and sounds exactly like something I would use. Reading more about your work, I stumbled upon this. Where are you exactly getting the data? Are you "just" using GO,KEGG, REACTOME and NCG? Also, why is it REACTOME_web has more results than GeneSCF if the last uses the first as a source?

Thanks for your answer, I'm really excited to test this tool, sounds closer to the Holy Grail I was searching for.

ADD REPLY
0
Entering edit mode

Thanks for your valuable feedback.

  • Current version of GeneSCF is restricted to access data only from Gene Ontology, KEGG, Reactome and NCG. Future versions may be have improvements to support more databases (with more surprises).

  • It is possible that Reactome FTP provides earlier version than on ReactomeWeb. As we described in the publication, GeneSCF access CURRENT version of Reactome dataset from 'http://www.reactome.org/download/current/ReactomePathways.gmt.zip' (It is possible that ReactomeWeb uses the recent regularly and minimally altered dataset but provides only processed stable version via FTP).

For more information on GeneSCF read the publication / Ask your questions on Biostar / Email (santhilal.subhash@gu.se).

ADD REPLY
0
Entering edit mode
7.9 years ago

The Gene Ontology is an ontology of genes. It annotates molecular functions (agnostic across species), cellular components (also agnostic across species) and biological function (somewhat to mostly agnostic across species). If you need something specific to EColi, take a look at http://ecoliwiki.net/colipedia/index.php/Welcome_to_EcoliWiki

ADD COMMENT
0
Entering edit mode

Thanks, I'll take a look to website! I know GO is species-agnostic by design, but what I need is to get a list as exhaustive as possible of all gene products annotated to the GO term "biological process" (GO:0008150) for a certain organism. My problem is I don't know where to ask, since I don't know how exhaustive the results I'm getting are, e.g., here I'm consulting all gene products annotated to GO:0008150 and filtering by both E. Coli and E. Coli K-12 (IDs 562 and 83333 respectively). Then a quick processing of the data would bring me all biological process GO terms associated to E.Coli, but given AmiGO, QuickGO and Ensembl exist, I don't know if I'm missing more databases to look-up.

I'll check the wiki, probably it will be useful to understand more this particular case.

ADD REPLY

Login before adding your answer.

Traffic: 1865 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6