Trying to get a hang of using KEGG data, and I'm having a hard time figuring out where to start. I have a list of metagenomic genes that have been annotated with KO numbers. There are couple things I would like to do with this data and haven't figured out how to yet:
- Visualize the full metabolic network: I know ipath does something like this, but I'm not sure how to turn my list of KOs into the correct format for ipath. Additionally, I would prefer something local.
- Get a list of nodes and edges within the network. This would be ideal for input into a network program like cytoscape or python's networkx. How do I get from KOs to nodes and edges?
I think these are pretty basic tasks for KEGG data, but I'm stuck. Any help/links to resources would be much appreciated.
One problem you'll encounter is that the KEGG ftp is not free. They changed it to a subscriber based system last year. To get at the node/edges, you'll need their pathway files which is located in their ftp. I had to resort to web scraping some of their data.
Did you use their api for that? Or is there some other database you would recommend?
I web scraped it. So I literately just downloaded most of their html pages with curl and parsed the information with a script. Their web service is pretty amenable to this type of heavy-handed data mining. However, I do not recommend doing this as it is, self admittedly, kinda a dick move on my part. It's not really meant to be used that way and probably can cause a lot of unnecessary traffic load.