Question

Kegg Data Download

3

Entering edit mode

13.5 years ago

Siva Kumar ▴ 30

I was using KEGG API to download certain information related to enzymes and pathways. KEGG API expects certain inputs for some of the methods in the API. For example, here is a call get_enzymes_by_compound(string:compound_id). This method expects a compound_id. A sample compound id given in the KEGG API reference manual is cpd:C00345. But there is no specific function to get all the mappings of the compounds and corresponding internal ids. Similar is the case of glicanid, reactionid etc., Has any one used KEGG API to download this information and if so, how are the input parameters like Compundid, enzymeid, reaction_id etc., are taken from? Please help me in this regard. Thank you in advance.

kegg api pathway • 6.2k views

ADD COMMENT • link updated 13.4 years ago by Joachim ★ 2.9k • written 13.5 years ago by Siva Kumar ▴ 30

0

Entering edit mode

I don't know if you have the same experience as I, but retrieving info (sequences especially) using the KEGG API is very low. If someone has access or has generated a mapping file, I would be very curious of it.

ADD REPLY • link 13.5 years ago by Manu Prestat 4.1k

score 1 · Answer 1 · 2012-02-14

1

Entering edit mode

13.5 years ago

Joachim ★ 2.9k

Hi!

If I understand you correctly, then you are having troubles with obtaining some of the parameters to KEGG API calls. In particular, you do not know how to retrieve all compound IDs in KEGG.

Here is how you get all compound IDs in human pathways (with BioRuby):

sudo gem install bio
sudo gem install soap4r-ruby1.9    # if you are using Ruby 1.9

Now lets write a little Ruby program, 'compounds.rb', that outputs the pathways and the compounds appearing in them:

#!/usr/bin/ruby

require 'bio'

serv = Bio::KEGG::API.new

pathways = serv.list_pathways('hsa')
pathways.each do |pathway|
    compounds = serv.get_compounds_by_pathway(pathway.entry_id)
    compounds.each do |compound|
        puts "#{pathway.entry_id}\t#{compound}"
    end
end

The output of the program, 'ruby compounds.rb', looks like this (tab-separated):

path:hsa00010   cpd:C00022
path:hsa00010   cpd:C00024
path:hsa00010   cpd:C00031
path:hsa00010   cpd:C00033
path:hsa00010   cpd:C00036

Now, you can modify and extend the program to get just a unique set of compounds for further processing. In case you just need a list of compound IDs, then you can simply run:

ruby compounds.rb | cut -f 2 | sort | uniq

Hope this helps,

Joachim

ADD COMMENT • link 13.5 years ago by Joachim ★ 2.9k

1

Entering edit mode

That is a good point, Hamish. However, people need to be aware that bulk downloads of KEGG are not free anymore: http://www.bioinformatics.jp/docs/subscription_schedule.pdf

ADD REPLY • link 13.5 years ago by Joachim ★ 2.9k

1

Entering edit mode

Yes KEGG FTP downloads are now subscription only (as noted on the page I linked along with details of why this option was chosen). However attempting to use the web services for bulk downloads can lead to you or your organisation being blacklisted by KEGG. So it is worthwhile considering a subscription if you need to do this. If you and your colleagues use KEGG a fair amount, it is possible that your organisation already has a subscription in place, and you just have to ask around to get hold of the data files.

ADD REPLY • link 13.5 years ago by Hamish ★ 3.3k

0

Entering edit mode

Worth noting that if you need a large chunk of the data it can be more efficient to download the required data sets and perform the processing locally instead of using the web services. For KEGG details of how to download the data can be found at http://www.kegg.jp/kegg/download/.

ADD REPLY • link 13.5 years ago by Hamish ★ 3.3k