Question

Compare the degree distribution of a real networks and multiple random networks

0

Entering edit mode

6.5 years ago

The Last Word ▴ 230

Before the paper real world networks are rarely scale free, studies used to (still do) check for the slope in degree distribution graphs and if the slope fell between 2 and 3, discern that the network is a real world network. The slope of my networks are not between 2 and 3, and so I decided to conduct a Kolmogorov-Smirnov test (KS test) to check if there is a significant change in the degree distributions of my network and random networks (my network is bipartite and hence I used the sample.bipartite function in R to create the random networks) I created with the same number of nodes and edges.

Eventhough I created a set of 500 random networks, KS test being a two sample test, I could only test the degree distribution of my network against one random network at a time. The problem is that the KS test gives me different answers (significant and not significant) for the same network when I compare it with different random network degree distributions from the 500 that I generated.

Is there a possibility to compare the degree distribution of my real world network with the degree distribution of all 500 random networks rather than get differing answers comparing one at a time? I read that Kruskal- Wallace test might be able to do this but I don't know if that's true.

I use graphpad PRISM for my stats analysis and my R is rusty, so please give me a test I can use on PRISM along with an example if possible. Thank you for your help in advance.

Or am I doing the whole thing wrong and KS test cannot be used to compare the distributions between two networks. Any ideas are gladly welcomed.

kolmogorov-smirnov network • 2.6k views

ADD COMMENT • link 6.5 years ago by The Last Word ▴ 230

0

Entering edit mode

Maybe you should step back and think more about why you're trying to do this ? What is the question you're trying to address with this ?

ADD REPLY • link 6.5 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

basically check and prove that the degree distribution of my network is different from a random one and state that its just not a random collection of nodes but potentially have underlying functional properties

ADD REPLY • link 6.5 years ago by The Last Word ▴ 230

0

Entering edit mode

Nice idea but can the degree show that? I find the results from network analyses variable, i.e., change one parameter and the entire results can change. The degree is also almost exclusively influenced by your cut-off for the edge values / weights. I guess that you could generate networks from random Gaussian-distributed data and compare the degree distributions by Chi square test to these.

ADD REPLY • link 6.5 years ago by Kevin Blighe 89k

0

Entering edit mode

Reread my comment to your previous post and the references there. My view is that given that one can not in general extrapolate the degree distribution from a subnet to the whole network and that biological networks are known to be incomplete, there's no point in doing this. Whatever the result, it only applies to the network instance you've studied but can't be generalized because what you're studying is known to be incomplete data.

ADD REPLY • link 6.5 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

I do understand that biological networks are incomplete and I am comparing just this one instance. My network was computationally generated and can't I use this methodology to test just this one instance.

ADD REPLY • link 6.4 years ago by The Last Word ▴ 230

0

Entering edit mode

Comparing your network instance to random networks will not give you much information. You could decide that it is unlikely to be generated by the random model you selected but it's not going to be conclusive. If you're interested in testing for biological relevance of your network, I would suggest to find some other ways, for example, your network may contain modules/clusters of biological relevance (i.e. protein complexes or pathways). After all, degree distribution only gives partial information about the network. Even if your network could be generated by a particular model or by none of those you tested, that wouldn't say anything about its biological relevance. Going back to the original question, when you're comparing with 500 networks, you're doing 500 tests so you should correct for multiple testing as some tests will be positive just by chance.

ADD REPLY • link 6.4 years ago by Jean-Karim Heriche 27k