I don't believe that there is any program that does this, specifically, What we're talking about here is the correlation between one gene and another. If they are highly and statistically significantly correlated (positively or inversely), then its evidence that they are working together in the same, for example, pathway, but its nowhere near proof.
The type of analysis that you want to do is probably suitable for network analysis, for which there are quite a few tools available, namely:
The igraph tutorial was written by me.
Whilst igraph and WGCNA don't take information about protein-to-protein interactions into account, STRING does do this.
Once you get your network constructed, you can look at metrics such as:
- Vertex degree
- Hub score
- Betweenness centrality
- community structure
- module eigenvalues (WGCNA)
These can be calculated as part of igraph or, if using STRING or WGCNA, you can export your network into Cytoscape and then derive these metrics there.
Kevin
Thank you Kevin. I actually have used WGCNA to make some STATIC modules for my gene list. If the results shows gene B is significantly(let's say just from p value) correlated with most of the modules built, then I can say gene B is working together with these genes? Since I only got RNA data at hand, I will try your igraph on my data. Thank you for your professional interpretation for my question.
I see, well, it's still an association that is fundamentally based on correlation. Obviously the evidence is stronger if your sample set is large and balanced / matched between your conditions / groups of interest. You would still need to experimentally prove the association in the lab, depending on the importance of the work that you're doing, of course. If this is for a clinical test, for example, then I do not believe that anyone would believe it without further wet-lab validation. If it's pure research and a hypothesis-generating study, then fair enough.
Your next work, for example, could be to look at an isolated panel of your genes using Fluidigm, Nanostring, or some other targeted approach (and on a larger batch of samples), or to even go as far as looking at protein expression of your genes via CyTOF.
I already have quite a bit of wet data to support the link between A and B. So the analysis should provide some evidence for my assumption. I had some immature idea that maybe I should try some fancy machine learning based packages about network analysis, but it seems most of these packages are time series-based and not good method for my clinical samples.