I have a list of GO names. I want to associate each name to the respective GO IDs.
I can recover the information by looking the GO name on QuickGO, but I have thousends of names to combine with IDs.
Any idea how could I do that?
I have a list of GO names. I want to associate each name to the respective GO IDs.
I can recover the information by looking the GO name on QuickGO, but I have thousends of names to combine with IDs.
Any idea how could I do that?
Using mysql:
$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D go -e 'select acc,name from term where name in ("DNA replication","fatty-acyl-CoA binding")'
+------------+------------------------+
| acc | name |
+------------+------------------------+
| GO:0006260 | DNA replication |
| GO:0000062 | fatty-acyl-CoA binding |
+------------+------------------------+
Most software that does this type of annotation will be available in the output, what was used to generate these? Outside of that you can write a pretty simple script to parse the go OBO file. http://geneontology.org/page/download-ontology
using SPARQL:
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?go
WHERE {
?go rdfs:label "DNA replication"@en .
}
NB: depending on the version of GO that you are using, you may not need the '@en'
s-query --service=http://localhost:3030/go/query --output=text "SELECT DISTINCT ?go WHERE { ?go <http://www.w3.org/2000/01/rdf-schema#label> 'DNA replication'@en . }"
-------------------------------------------
| go |
===========================================
| <http://purl.org/obo/owl/GO#GO_0006260> |
-------------------------------------------
I'm not familiar with SPARQL, I think it's some kind of distributed database. You run s-query to connect to a service running on localhost:3030. That will require the user's computer to have some kind of database server running, could you elaborate on what's going on here? This looks like a tool I might like to get loaded on my workstation too!
Yes, because I have a local SPARQL server for GO but you could just as well use BioPortal SPARQL endpoint (http://sparql.bioontology.org/). It requires to register if you want to do it programmatically though.
I just happen to find SPARQL pretty convenient for querying GO and GOA.
This GeneSCF tool has a folder called 'annotation' where you can find plain text files with process and ids:
Gene Set Clustering based on Functional annotation (GeneSCF)
In R:
library(GO.db) goterms= Term(GOTERM) go_definition = merge(your_list_of_GOs, goterms, by.x = "category", by.y="row.names")
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I blasted my genomes against nr and loaded the results in Blast2Go to obtain GO annotation. Unfortunately, I miss the scripting part. Do you know if any package in Bioconductor is doing that?Or if there is any good link I could use to get these data?