Question

Weighted Gene Expression Network

0

Entering edit mode

6.7 years ago

shahroze786 • 0

I have a basic network question, I've been trying to research the typical methodology behind building a gene expression network. As I understand it so far the steps are as follows: Create pearson correlation matrix -> create adjacency matrix (weighted or unweighted) -> create topological overlap matrix (there are variations to this such as the generalized TOM). I know the correlation values will be between -1 and 1. The unweighted adjacency matrix will be 0/1 based on a hard cutoff where as a weighted will be between 0 - 1 emphasizing the differences. My question is in two parts: Can you take a weighted adjacency matrix and give it further edge weights, or edge weighting only applies to an unweighted adjacency matrix? Also, can the TOM be values between 0 - 1 or is it a matrix of only 0s and 1s?

WGCNA network R • 2.7k views

ADD COMMENT • link updated 6.7 years ago by Kevin Blighe 88k • written 6.7 years ago by shahroze786 • 0

score 3 · Accepted Answer · 2018-03-24

Network construction is more flexible than you may imagine - at virtually every step there are multiple possible ways in which the construction can proceed. Based on the terminologies that you've used, I imagine that your main experience of networks to date has been WGCNA?

Networks can be constructed (and weighted) based on any distance metric, be it correlation, Euclidean distance, or something else. Their construction may even be guided based on known protein-to-protein and pathway interactions (as is performed with STRINGdb). The distance metric that's used can then represent the weight between any 2 vertices (vertex = node or gene), with those edges falling below a particular threshold (i.e. weight) being removed if they are weak enough. For example, we may construct a co-expression network based on pairwise correlations between all genes and then remove correlations (representing edges) that fall below absolute Pearson r=0.8, leaving only very strong connections. The point is that one can easily portray a network already based on correlations in the range -1 to +1. In fact, I have a simple tutorial for this, here: Network plot from expression data in R using igraph

Regarding TOM, this was more a term introduced by Steve Horvath for WGCNA (I believe), but the original logic behind this was mentioned in a Science publication back in 2002: Hierarchical organization of modularity in metabolic networks. As much as I'm aware, the TOM is just a term used to describe the final modules that are derived through the WGCNA process, with modules essentially being just branches in the dendrogram that are defined based on a tree 'cut height', and the 'matrix' heatmap then showing how the different modules correlate to each other. Principal components analysis (PCA) is then performed on each of these modules.

A similar logic to TOM and its modules comes with community structure identification, which I mention briefly at the end of my tutorial (Step 4).

-------------------------

Specifically related to your questions, you can therefore weight a network in any way. The weights can represent correlation strength, Euclidean distance, or anything else such as reaction efficiency (enzymes), distance (kilometers / miles), et cetera. If you wish to set a threshold for edges to keep, like, I mentioned in my first paragraph, then you can simply dichotomous the edge weights as being:

0 (below set threshold; no edge)
1 (above threshold; edge present)

The main idea that I want you to get from reading this answer, though, is that network construction is very flexible.

Kevin

-------------------------

-------------------------------------