I would like to know what libraries/modules are used by BioStar members for large scale analysis of networks (for example PPI, DDI etc). I am looking for a module that can provide various functions to calculate network algorithms and network properties. Since I have to deal with thousands of networks, Cytoscape via GUI will not be a feasible option. I think a command line version of version of the Cytoscape plugin NetworkAnalyzer will be a good start.
I know about Bio::Network at Bioperl, but the functionality is very limited. Networkx have some of the implementation of network algorithms from a graph theory perspective, but not as extensive as in NetworkAnalyzer.
I like igraph. It's available as an R package, a Ruby gem, a Python extension or a C library. It reads/writes most of the common network formats and provides a good selection of network statistics - see for example the R documentation.
I wrote a very basic getting started tutorial - it focuses on social network data, but should easily adapt to any kind of network.
For visualisation I prefer Gephi to Cytoscape (although they are both RAM-guzzling Java monsters). I believe they are developing an API but it's primarily a visualisation GUI at the moment.
For large graph analysis I can fully recommend LEDA (library for efficient algorithms). It is written in C++ but provides a very high level interface that looks almost like a scripting language. Those with knowledge of how to program in any language can pick it up in hours. The library is very well optimized and has great performance.
For example here is a code fragment demonstrating both creating a graph and then calling the single source shortest path algorithm with the Dijkstra method:
I recently learned about WGCNA. This is an R package for weighted correlation network analysis. It looks rather interesting and useful and is something I plan to implement. This R software package is a comprehensive collection of R functions for performing various aspects of weighted correlation network analysis. The package includes functions for network construction, module detection, gene selection, calculations of topological properties, data simulation, visualization, and interfacing with external software.
If you want to do network analysis on a really large scale (ie, analysing many different networks), I can't imagine R or cytoscape would be particularly good options. The former haeorrhages memory and the latter isn't (easily?) scriptable and the plugins aren't interoperable.
I'd recommend JUNG, which has loads of network analysis algorithms already coded up. JUNG also has graph IO modules etc. It's probably not perfect for what you want, because you'll have to stitch modules together yourself. However, they're a very helpful bunch.
R
ADD COMMENT
• link
updated 6.3 years ago by
Ram
44k
•
written 13.9 years ago by
Russh
★
1.2k
I would suggest Gremlin. It is a domain specific language (DSL) to analyze (think algorithms) and explore (think traverse) graphs originally influenced by XPath. The implementation is superb and the community vibrant. Gremlin sits on top of Blueprints which is a Graph API that "wraps" various types of graph providers e.g. a OpenRDF Sail or the highly scalable Neo4J graph database, and provides implementations of other Java Graph APIs e.g. Jung. This gives great flexibility - if your data grows just switch the backend.
Thanks Neil. igraph looks like a good start for me. Your tutorial will be really useful.
Yesterday I heard from Gephi regarding their API : http://twitter.com/kshameer/status/18101377504