Guys,
I am looking for a statistical test for my purpose.
Although I believe that this test should exist, can't find one.
Given 1000 genes and many connections between them, suppose you have two subgroups of genes from the population, each consisting of 5 genes, and are asked if the groups share significantly many connections.
One way to answer is to count individual connections between genes in the groups.
Suppose group A has gene 1,2,3,4,5, and group B has gene 6,7,8,9,10.
When any gene can be connected to any gene, there can be 25 possible connections between group A and B.
Now, if there are actually 23~24 connections between the groups, probably you can say the groups are connected. On the other hand, if there are only 2~3, then probably they're not.
But what kind of statistical test can estimate the p-value more systematically?
I think I can calculate the distribution from which to calculate the p-value.
Since all genes and all connections are given, I can just simulate groups (according to actual size of group A and B) and count their connections many times for the distribution. Then, I can calculate p-value of the number of connections between group A and B with respect to the distribution.
But I think there might be a more systematic way to do this, can you please show me what it is?
Thanks