How To Extract Go Terms From A Given Kegg Id
4
2
Entering edit mode
12.4 years ago
Rm 8.3k

How to use biomart to link KEGG pathway ID to GO terms?

biomart kegg go • 9.2k views
ADD COMMENT
6
Entering edit mode
12.4 years ago
Neilfws 49k

I don't think this is possible using most web-based implementations of BioMart, since the underlying database does not contain KEGG identifiers.

The closest I can find to what you want is this file, mapping KEGG reaction IDs to GO terms.

ADD COMMENT
0
Entering edit mode

Thanks @Neilfws: May be I need to link using genes gene->GO ; Gene->KEGG ; then extrapolate KEGG to GO

ADD REPLY
2
Entering edit mode
12.4 years ago
Joachim ★ 2.9k

You can get GO terms that are linked to KEGG pathways via the KEGG API.

This Ruby script, go.rb, uses BioRuby to extract GO term(s):

require 'bio'

# Read in pathway ID from the command line:
pathway_id = ARGV[0]

# Connect to the public KEGG API server:
server = Bio::KEGG::API.new

# Retrieve a single pathway:
pathway_sheet = server.get_entries(["PATHWAY:#{pathway_id}"])

# Turn the textual representation into a Ruby object:
pathway = Bio::KEGG::PATHWAY::new(pathway_sheet)

# Check if there is a DB link to GO:
if pathway.dblinks.has_key?('GO') then
    # Print each GO term on a separate line:
    pathway.dblinks['GO'].each { |term|
        puts "GO:#{term}"
    }
end

You can use this script on the command line as follows:

$ ruby go.rb hsa04020
GO:0019722
$ ruby go.rb hsa04210
GO:0006915
...

This will give you the GO term(s) that are linked to pathway hsa04020.

Hope that helps.

UPDATE:

An R solution using KEGGSOAP of Bioconductor.

# For installing Bioconductor and the KEGGSOAP package, run:
# source("http://bioconductor.org/biocLite.R")
# biocLite("KEGGSOAP")

library(KEGGSOAP)

# Get the textual representation got the pathway:
# (For now, there is no function like get.genes.by.pathway for getting dblinks.)
pathway <- bget("PATHWAY:hsa04020")

# Split the very long textual description into individual lines:
pathway.lines <- unlist(strsplit(pathway, '\n'))

# Create an empty vector for storing GO terms of the pathway:
pathway.go.terms <- c()

# Create a variable that is set to TRUE when we are processing the DBLINKS section:
in.dblinks <- FALSE

# Go through the pathway description line-by-line:
for (n in 1:length(pathway.lines)) {
  # If we are in the DBLINKS section, figure out when we leave it again:
  if (in.dblinks == TRUE && !(substring(pathway.lines[n], 1, 1) == " "))
    in.dblinks <- FALSE

  # When we see the beginning of the DBLINKS section, jot this down:
  if (in.dblinks == FALSE && substring(pathway.lines[n], 1, 8) == "DBLINKS ")
    in.dblinks <- TRUE

  # If we are in the DBLINKS section, then look out for GO terms and save them:
  if (in.dblinks == TRUE && substring(substring(pathway.lines[n], 13), 1, 3) == "GO:")
    pathway.go.terms <- append(pathway.go.terms, substring(pathway.lines[n], 13))
}

# The GO terms of the pathway are now accumulated in the vector pathway.go.terms.
ADD COMMENT
0
Entering edit mode

Thanks @Joachim: Any R alternative?

ADD REPLY
0
Entering edit mode

Well, there is always: go <- system2("ruby", "go.rb hsa04020", stdout=TRUE)

ADD REPLY
0
Entering edit mode

I tried similarly as described here : http://www.r-bloggers.com/calling-ruby-perl-or-python-from-r/ : in windows I need to install Ruby and all....

ADD REPLY
1
Entering edit mode

I updated my answer with an R solution. Big thanks to Neil for pointing out KEGGSOAP. Too bad that a get.dblinks.by.pathway function has not been implemented yet though.

ADD REPLY
0
Entering edit mode

@Joachim: +1 ; Thanks for the R update tooo. I also appreciate the your "commenting" the code step by step.

ADD REPLY
0
Entering edit mode

R/Bioconductor has multiple KEGG-related packages: http://bioconductor.org/help/search/index.html?q=kegg. KEGGSOAP may do what you want.

ADD REPLY
0
Entering edit mode

Thanks @Neilfws: I will give it a try...

ADD REPLY
1
Entering edit mode
7.1 years ago
daveshire ▴ 10

I know this is a dead thread, but I wanted to do roughly the same thing as the first poster and found that KEGG's linkDB system works pretty well. It was easy to pull up a list of all KO : GO term matches and it looks like there are various other mappings that it can be used for but I haven't tried them all.

http://www.genome.jp/linkdb/

ADD COMMENT
0
Entering edit mode
5.7 years ago

Via transitivity; GO <-> Orthology (KO terms), Orthology <-> PubmedID, PubmedID <-> Pathway; KEGG API/ LinkDB allows for structuring a many-many linkage map between GO and Pathway terms that isn't directly available (although marked 'routed' on the official page). This has to be an explicit effort.

P.S. Contrarily, I do argue the veracity of this metric. A GO ID is indicative of a gene, while KEGG ID that of a pathway. By doing the above, we are throwing away quite a lot of background information by representing a pathway merely by a gene.

ADD COMMENT

Login before adding your answer.

Traffic: 2566 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6