Hi All,
I would just like some clarification of terminology regarding a detail of gene coexpression network construction. Let's say I have two RNA-seq datasets, each dataset containing n
replicates, and each dataset representing sequencing data from the same biological system in two different experimental conditions. How should I construct the data matrix for input to something like WGCNA if I want to analyze gene coexpression networks across experimental conditions/interventions?
What I imagine is that each row of the matrix represents data from one gene, and each column represents data collected from one of the replicates in an experimental condition. So for example, one particular row of the matrix would look like this:
c1R1 ... c1Rn c2R1 ... c2Rn
gene x [val, ... val, val, ... val]
Where the first column c1R1
corresponds to the data from the first experimental replicate in the first condition, and the last column c2Rn
corresponds to the nth experimental replicate in the 2nd experimental condition. For coexpression analysis, each row is then correlated with every other row in a pairwise fashion, an adjacency matrix is constructed from the correlation analysis and then other analyses such as module detection can be conducted based on the resulting adjacency matrix.
I just want to verify that this is an appropriate method for organizing data if one wishes to construct coexpression networks for genes "across an intervention".
Hi Keith! Thanks for the comment, I really appreciate it. This term was thrown around a lot in literature and I just wanted to make sure that I was interpreting it correctly. In general, I have been considerate of the concerns you raised in points 1 and 2. In addition, I would also add (for other readers that are perhaps new to the technique) that interpreting coexpression networks within some other biological context is crucial, and what the utility of the coexpression analysis is should be understood a priori. For example, does one intend to use the coex-network to discover regulatory hubs from a poorly understood disease state, or does one wish to understand the co-regulatory structure of a well understood set of genes in a specific experimental condition (these are just 2 examples, there are many other possibilities).