Networks: Fastest Way To Define Cliques And Connectors Between Them For Large Datasets
2
4
Entering edit mode
12.7 years ago
User 8257 ▴ 40

Hi everyone,

I'm looking for the fastest way to define cliques (communities) and edges that connect between them on large datasets (10^5-10^6 nodes). So far I've tried Cytoscape (it chokes) and R's igraph library (it doesn't choke, but it's quite slow). Would appreciate your advice on the best ways to do this in R - as well as alternative tools.

Many thanks!

network r graph • 4.6k views
ADD COMMENT
7
Entering edit mode
12.7 years ago

Finding maximal cliques is NP-complete, meaning that the time to complete the task increases so quickly that it doesn't take many nodes to make the computation impossible to complete before the heat death of the universe. Graph theory is a big literature and there's a lot of specialist work on approximating this problem, but in practice I don't think exact solutions are really required in bioinformatics analysis. I often compromise by finding all cliques of size three or four, which is tractable even at fairly large network sizes and it's a practical way to enrich for nodes that are part of larger cliques or near-cliques. Another heuristic method is to repeatedly select for highly connected nodes, as that will give you near-cliques.

I like the networkx python library for working with cliques and for working with graphs in general. There's also a graph library in the boost libraries if you speak C++; more work for the programmer than python but very fast.

ADD COMMENT
0
Entering edit mode

Thanks very much, very helpful!

ADD REPLY
7
Entering edit mode
12.7 years ago

Hi,

Cliques and communities are somewhat different. A clique is a maximally connected subgraph whereas communities are groupings/clusters in your network, defined by edge densities or some other metric.

Moses and/or CFinder should help you find community structure. In biological networks, overlapping communities are perhaps more realistic but this depends on your hypothesis. Also, if you are interested in finding particualr motifs or instances of cliques, e.g., all maximally connected 4 node groups, Graphgrep is useful.

D.

ADD COMMENT
0
Entering edit mode

Thanks a lot for your advice on the tools - and for pointing out the difference between cliques and communities!

ADD REPLY

Login before adding your answer.

Traffic: 2889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6