I am trying to manually calculate the pathway enrichment for some genes. I have downloaded the panther and reactome data files which have the pathway annotations for genes. However, I couldn't find similar files for the KEGG pathway database.
The one I downloaded from ftp://ftp.genome.jp/pub/kegg/pathway/pathway consists of only the descriptions for each pathway without the gene list. Could someone point me to the link of a gene pathway association file for KEGG?
I have also tried the KEGG python api, but I couldn't get a list returned with the following codes:
from SOAPpy import WSDL
wsdl = 'http://soap.genome.jp/KEGG.wsdl'
serv = WSDL.Proxy(wsdl)
serv.get_pathways_by_genes(['ENSG00000120328'])
serv.get_pathways_by_genes(['ENST00000361510'])
serv.get_pathways_by_genes(['OPA1'])
It did return a pathway list by using gene names like:
serv.get_pathways_by_genes(['eco:b0077' , 'eco:b0078'])
But my gene names are either Ensembl gene id or official gene symbol. Is there a way do find the correspondence between this kind of eco gene and Ensembl gene?
Thanks
note: it seems you have posted two questions: how to parse KGML files and how to convert KEGG ids to Ensembl ids. It would be easier to answer you is you can split the questions.
note that KEGG's FTP now requires a commercial subscription to be accessed (http://www.genome.jp/kegg/docs/plea.html). Some of the answers in this thread may not be available without that subscription.