Have you thought about using the KEGG API? See the following links for more information:
Also, BioRuby seems to have a pretty good API implemented:
As does the R Bioconductor KEGGSOAP package:
The following (simple) Python script should work a treat for now though ;)
"""
Python script to retrieve KEGG gene entry for a number of different genes
Coded by Steve Moss (gawbul [at] gmail [dot] com
http://about.me/gawbul
"""
from SOAPpy import WSDL
kegg_wsdl = 'http://soap.genome.jp/KEGG.wsdl'
kegg_service = WSDL.Proxy(kegg_wsdl)
gene_names = ("ALDOA", "BHLHB3", "PKM2", "P4HA1", "EPO")
for gene_name in gene_names:
gene_entries = kegg_service.bfind("genes " + gene_name + " hsa").rstrip("\n").split("\n")
print "Found %d entries for %s" % (len(gene_entries), gene_name)
for gene_entry in gene_entries:
results = kegg_service.bget("-f " + gene_entry.split(" ")[0])
print results
You could modify this to read the gene name entries from a file and feed them in that way, and perhaps also write the output to a file too, instead of displaying in STDOUT.
Essentially this uses the SOAP/WSDL framework to implement the equivalent of the HTTP URLs in a form readable by a computer (web service). You can build queries using the KEGG API just as you would a URL, e.g. the above "kegg_service.bget("-f hsa:" + gene_name)" is the same as calling http://www.genome.jp/dbget-bin/www_bget?-f+hsa:aldoa, except the data is returned in XML to the script, rather than HTML, as it would to the browser.
Hi, Steve. Thank you for providing so many resources. Problem solved.
No problem :) Glad to be of assistance!