How to use biomart to link KEGG pathway ID to GO terms?
How to use biomart to link KEGG pathway ID to GO terms?
You can get GO terms that are linked to KEGG pathways via the KEGG API.
This Ruby script, go.rb
, uses BioRuby to extract GO term(s):
require 'bio'
# Read in pathway ID from the command line:
pathway_id = ARGV[0]
# Connect to the public KEGG API server:
server = Bio::KEGG::API.new
# Retrieve a single pathway:
pathway_sheet = server.get_entries(["PATHWAY:#{pathway_id}"])
# Turn the textual representation into a Ruby object:
pathway = Bio::KEGG::PATHWAY::new(pathway_sheet)
# Check if there is a DB link to GO:
if pathway.dblinks.has_key?('GO') then
# Print each GO term on a separate line:
pathway.dblinks['GO'].each { |term|
puts "GO:#{term}"
}
end
You can use this script on the command line as follows:
$ ruby go.rb hsa04020
GO:0019722
$ ruby go.rb hsa04210
GO:0006915
...
This will give you the GO term(s) that are linked to pathway hsa04020.
Hope that helps.
UPDATE:
An R solution using KEGGSOAP of Bioconductor.
# For installing Bioconductor and the KEGGSOAP package, run:
# source("http://bioconductor.org/biocLite.R")
# biocLite("KEGGSOAP")
library(KEGGSOAP)
# Get the textual representation got the pathway:
# (For now, there is no function like get.genes.by.pathway for getting dblinks.)
pathway <- bget("PATHWAY:hsa04020")
# Split the very long textual description into individual lines:
pathway.lines <- unlist(strsplit(pathway, '\n'))
# Create an empty vector for storing GO terms of the pathway:
pathway.go.terms <- c()
# Create a variable that is set to TRUE when we are processing the DBLINKS section:
in.dblinks <- FALSE
# Go through the pathway description line-by-line:
for (n in 1:length(pathway.lines)) {
# If we are in the DBLINKS section, figure out when we leave it again:
if (in.dblinks == TRUE && !(substring(pathway.lines[n], 1, 1) == " "))
in.dblinks <- FALSE
# When we see the beginning of the DBLINKS section, jot this down:
if (in.dblinks == FALSE && substring(pathway.lines[n], 1, 8) == "DBLINKS ")
in.dblinks <- TRUE
# If we are in the DBLINKS section, then look out for GO terms and save them:
if (in.dblinks == TRUE && substring(substring(pathway.lines[n], 13), 1, 3) == "GO:")
pathway.go.terms <- append(pathway.go.terms, substring(pathway.lines[n], 13))
}
# The GO terms of the pathway are now accumulated in the vector pathway.go.terms.
I tried similarly as described here : http://www.r-bloggers.com/calling-ruby-perl-or-python-from-r/ : in windows I need to install Ruby and all....
R/Bioconductor has multiple KEGG-related packages: http://bioconductor.org/help/search/index.html?q=kegg. KEGGSOAP may do what you want.
I know this is a dead thread, but I wanted to do roughly the same thing as the first poster and found that KEGG's linkDB system works pretty well. It was easy to pull up a list of all KO : GO term matches and it looks like there are various other mappings that it can be used for but I haven't tried them all.
Via transitivity; GO <-> Orthology (KO terms), Orthology <-> PubmedID, PubmedID <-> Pathway; KEGG API/ LinkDB allows for structuring a many-many linkage map between GO and Pathway terms that isn't directly available (although marked 'routed' on the official page). This has to be an explicit effort.
P.S. Contrarily, I do argue the veracity of this metric. A GO ID is indicative of a gene, while KEGG ID that of a pathway. By doing the above, we are throwing away quite a lot of background information by representing a pathway merely by a gene.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks @Neilfws: May be I need to link using genes gene->GO ; Gene->KEGG ; then extrapolate KEGG to GO