I have 2 coexpression networks: (1) control phenotype; and (2) treatment phenotype. They are using the same subset of genes so they are directly comparable (n=1700). I used WGCNA to create 2 separate coexpression networks using the same soft power threshold (beta=10) for both. This question is more conceptual than actual commands or tools to use but I am looking for methods that can allow me to compare the differences in network topology between the 2 phenotypes. My first idea was to calculate the differential coexpression network as suggested in this paper https://www.nature.com/articles/srep13295 but it's a little weird to work with the resulting matrix. The only other approach I could think of was to have a pairwise analysis of modules between each network to get an overlap measure.
Does anyone know of any methods, tools, or approaches that work well for datasets like this?
My main research question is to identify the biggest changes in network topology between the 2 phenotypes.
Dataset description:
It's a dataset with 49 samples of patients without a disease and 34 with a disease. 28 of the patients have a twin in the dataset (n=28*2=56) and 27 do not have a twin account for in the dataset for a total of 56 + 27 = 83 samples/patients. The gene subset I'm investigating is 1700 so each network consist of a 1700x1700 symmetrical similarity matrix. There is a separate network for the healthy state and a separate one for the diseased state.
I have noticed somewhere people topologically analyze each network independently (for example node degree, betweeness centrality, etc) and remove common the highest degrees (for instance) and keep only the genes with the highest centrality value as treatment specific hub genes (differential connectivity). In WGCNA, once I noticed people used control network as a reference network to calculate z-score preservation that will find modules those are more related to the treatment.
That's pretty interesting to use the control network as a base for looking at distributions. Thanks! What about module comparison?
As I mentioned WGCNA by using control network as reference can tell us which module is being more dysregulated in treatment. Higher z score, more a module related to the treatment
I'm having trouble finding references for this approach but it seems very useful. I have been having difficulty relating modules to categorical metadata. Is this what you are referring to above? Or do you mean investigating a particular attribute, e.g. intramodular connectivity of genes, and use all of the values from the control network as a background distribution and do a mannwhitney (or similar) for the module of interest against the background distribution? Sorry to ask so many questions but I just want to make sure I understand what you are suggesting.
Exactly I mean what you mentioned " investigating a particular attribute, e.g. intramodular connectivity of genes, and use all of the values from the control network as a background distribution and do a Mann-Whitney (or similar) for the module of interest against the background distribution". However, the outcome of this procedure just would be; for instance, if the blue module in your network seem to be correlated to the treatment significantly, by this procedure (using control network as reference) you can state this correlation is meaningful not just by chance (higher z-score). Please notice that " relating modules to categorical metadata" step is performed beforehand to find correlated modules to your interesting trait. Personally for my purpose as I don't have access to a trait file (fore example measuring cholesterol, etc) I have to prepare a binary trait file (to show which module or genes are related to the treatment). However, I have already accumulated WGCNA codes for each step and if you want I can share with you but you must change the input, etc based on your purpose.
If you have code snippets that can illustrate how to correlated the eigengenes with categorical data like treatment that would be very useful. I'm doing most of the computation outside of R in Python so a good understanding of the methods is crucial. For example, if I have continuous data then I just calculate the 1st principal component of the module and then pearson correlation with the continuous metadata but you can't do that with the categorical data.
My codes are in R My traits is binary, I then made a 1 and 0 trait file and use
for correlation as mentioned in WGCNA; although I am not sure if I am correct
Please consider this
https://ibb.co/mftzq7
I'm getting values for this but I don't know if it makes sense since we wouldn't necessarily expect there to be a linear correlation between a continuous variable and a categorical variable right?
Actually this is also my question I read somewhere people use biweight midcorrelation (bicor) function instead of Pearson cor function Or if I am not wrong take log of binary trait file. However I got confused which is correct... please share with me if you find the answer Thanks
As the same with module preservation (comparing treatment network with control network), comparing modules with each other (conservation) will give you conserved genes between two interesting modules; implemented in WGCNA