Network construction is more flexible than you may imagine - at virtually every step there are multiple possible ways in which the construction can proceed. Based on the terminologies that you've used, I imagine that your main experience of networks to date has been WGCNA?
Networks can be constructed (and weighted) based on any distance metric, be it correlation, Euclidean distance, or something else. Their construction may even be guided based on known protein-to-protein and pathway interactions (as is performed with STRINGdb). The distance metric that's used can then represent the weight between any 2 vertices (vertex = node or gene), with those edges falling below a particular threshold (i.e. weight) being removed if they are weak enough. For example, we may construct a co-expression network based on pairwise correlations between all genes and then remove correlations (representing edges) that fall below absolute Pearson r=0.8, leaving only very strong connections. The point is that one can easily portray a network already based on correlations in the range -1 to +1. In fact, I have a simple tutorial for this, here: Network plot from expression data in R using igraph
Regarding TOM, this was more a term introduced by Steve Horvath for WGCNA (I believe), but the original logic behind this was mentioned in a Science publication back in 2002: Hierarchical organization of modularity in metabolic networks. As much as I'm aware, the TOM is just a term used to describe the final modules that are derived through the WGCNA process, with modules essentially being just branches in the dendrogram that are defined based on a tree 'cut height', and the 'matrix' heatmap then showing how the different modules correlate to each other. Principal components analysis (PCA) is then performed on each of these modules.
A similar logic to TOM and its modules comes with community structure identification, which I mention briefly at the end of my tutorial (Step 4).
-------------------------
Specifically related to your questions, you can therefore weight a network in any way. The weights can represent correlation strength, Euclidean distance, or anything else such as reaction efficiency (enzymes), distance (kilometers / miles), et cetera. If you wish to set a threshold for edges to keep, like, I mentioned in my first paragraph, then you can simply dichotomous the edge weights as being:
- 0 (below set threshold; no edge)
- 1 (above threshold; edge present)
The main idea that I want you to get from reading this answer, though, is that network construction is very flexible.
Kevin
For someone like me the flexibility is sort of the problem, there's always an opportunity to make a change and observe the impact. Quick question, not sure about your particular expertise, but do you know the differences between constructing a gene expression network from Pearson correlation or Euclidean distance?
Right, and that's in part why network analysis has not made the progress that it ought to. It ends up confusing people and results can be hyper-variable even after just a few minor modifications to the parameters. It is frowned upon by some experienced statisticians that I know. The figures can look absolutely beautiful but often they lack biological meaning, or biological meaning is too difficult to make.
Pearson correlation and Euclidean distance are obviously just 2 different statistical measures, both of which are better applied to normally-distributed data. If you've got some abnormal data or your dataset is small, you should be using Spearman correlation. If you still want to apply Euclidean distance in such a situation, you can most likely get away with it through review.
Correlation can be easy to explain, as it's just 'Are these genes negatively or positively expressed together?' - this is intuitive to most people and the biological meaning is easier to grasp.
Euclidean distance, the square root of the sum of the squared distances between 2 data-points, is obviously a bit more difficult to explain and the biological meaning can immediately become lost. Yet, Euclidean distance is arguably the most common distance metric used in clustering and, from my perspective, is warranted for use in network analysis.
-------------------------------------
If we have a small data-set of just 3 samples (cols) and 2 genes (rows):
The Euclidean distance for gene1 and gene2 is:
Check:
Spearman correlation is:
Not sure why I gave this simple example but anyway.
Kevin