What I want is retrieve GO terms associated to the genes from one organism directly from Ensemble Bacteria database.
I know this was possible at certain point after EnsembleGenomes abandoned the BioMart suite (acording to this post), but I couldn't fin anything in the documentation. Particularly, I searched this in
but I couldn't fin anything about GO terms.
I know this was possible back then with BioMart:
- Obtaining a list of genes of a certain organism (e.g. E. Coli Tax ID 562)
- Selecting the mart ensemblbacteria
- Using the dataset of the organism you needed
- Filter the dataset by your gene IDs and retrieve 'go_id' attribute from filtered rows.
but now that's no longer an option. I really don't know how to do this and even if it's possible at all.
I know there are tools (like GeneSFC) and services (like QuickGO) which can do similar things, but that's exactly why I'm trying to do this, because I want to benchmark and compare which results I obtain from the Emsemble and compare (and complement) it with other results.
Hi! Thanks for your answer, of course this will be useful! Isn't there a way to ask for all genes given a certain species? Is it possible to restrict the results with 'compara' parameter?
Greetings
The REST API would be good for extracts of the Ensembl Bacteria database and if you can get the GO info via the REST API with Perl, Python, Ruby, Java, Curl or Wget GO for hundreds of genes or perhaps even the entire genome of you favourite bacteria (around 4,000 genes?). For all genes in any given species you can also access the data via the Ensembl Perl API, if you know Perl. The Compara analysis has not been done for all bacterial species in Ensembl, rather for a subset of 202 genomes.