Kegg And Network Visualization - Getting Started
1
2
Entering edit mode
12.1 years ago
miz ▴ 20

Trying to get a hang of using KEGG data, and I'm having a hard time figuring out where to start. I have a list of metagenomic genes that have been annotated with KO numbers. There are couple things I would like to do with this data and haven't figured out how to yet:

  1. Visualize the full metabolic network: I know ipath does something like this, but I'm not sure how to turn my list of KOs into the correct format for ipath. Additionally, I would prefer something local.
  2. Get a list of nodes and edges within the network. This would be ideal for input into a network program like cytoscape or python's networkx. How do I get from KOs to nodes and edges?

I think these are pretty basic tasks for KEGG data, but I'm stuck. Any help/links to resources would be much appreciated.

kegg pathway genome • 4.1k views
ADD COMMENT
2
Entering edit mode

One problem you'll encounter is that the KEGG ftp is not free. They changed it to a subscriber based system last year. To get at the node/edges, you'll need their pathway files which is located in their ftp. I had to resort to web scraping some of their data.

ADD REPLY
0
Entering edit mode

Did you use their api for that? Or is there some other database you would recommend?

ADD REPLY
2
Entering edit mode

I web scraped it. So I literately just downloaded most of their html pages with curl and parsed the information with a script. Their web service is pretty amenable to this type of heavy-handed data mining. However, I do not recommend doing this as it is, self admittedly, kinda a dick move on my part. It's not really meant to be used that way and probably can cause a lot of unnecessary traffic load.

ADD REPLY
3
Entering edit mode
12.1 years ago
Josh Herr 5.8k

Like you, I am also working with metagenomic data and I was having a similar problem this last year getting KEGG data for my metagenomic reads. As Damian says the FTP for KEGG is not free, and it's quite expensive; it wasn't a matter of us shelling out some cash for it.

My solution was to use MG-RAST. You'll have to upload your data and run it through their pipeline, but in the analysis section after you upload your data you can download the KEGG information. I was then able to output the node and edge data and use the kgmlreader application from within Cytoscape. The downside to this is, depending on how much sequence data you have and depending on their server loads, it can take up to a week to run your data through the MG-RAST pipeline. MG-RAST is a web server so you won't be able to set up anything local, but then you can take your tabular output and run it into Cytoscape locally.

There may be another web service out there that will provide you with the node/edge data. I'm in agreement with Damian: I'm sure there are ways to find the node/edge data online, but you'll have to do some serious searching and/or data format manipulation.

ADD COMMENT

Login before adding your answer.

Traffic: 2077 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6