Dear Friends,
I have the gene expression microarray dataset (about 17000 genes) of about 400 cancer samples with different cancer subtypes (say A, B, C, and D) and about 30 control samples, which were collected from the different datasets (meta-analysis). Here, I used only cancer samples and considered 50% of genes with the highest variance as input for WGCNA and selected signed network type. I considered subtypes as traits (binary traits) and used WGCNA to find the possible modules associated with traits and corresponding hub genes. I found that some modules are significantly associated with just subtypes A and B. In the next step, I applied module preservation analysis to examine if the associated modules with subtype A are preserved or non-preserved in other subtypes. So, I considered subtype A as a reference and other subtypes as a test and conducted the preservation analysis for each subtype and the reference, separately. As almost expected, associated modules with subtype A are non-preserved in other subtypes. However, I have some questions in this regard; kindly share with me your suggestions.
- Are the above working steps logical in your view? is it reasonable to do module preservation analysis in the same dataset?
I’m also thinking of doing module preservation analysis with control samples as a reference and each subtype. But, I’m not still sure about it since the sample size of the control is almost small (28 samples) and some modules will be obviously non-preserved between control and each cancer subtype. Please kindly advise me with your helpful comments and suggestion.
- Regarding the module preservation analysis, as I read, the
Zsummary
parameter has a strong dependence on module size, so I used themedianRank
parameter and considered modules withmedianRank ≥ 8
as non-preserved modules, is it acceptable?
Thank you in advance
Hi seta,
I can't comment on the first point as you have a fairly complex experimental design for a differential co-expression analysis. In this work the authors compared two reference networks (susceptible and resilient) against a control dataset. Each dataset included expression data from 4 brain regions (PFC, NAC, AMY, VHIP) and time points (early, late, stress-primed). Perhaps this could help.
Regarding the
Zsummary
andmedianRank
, ultimately you can use both. Keep in mind thatmedianRank
only ranks modules from the less to the most preserved one, but is not going to tell you which module is not preserved in the test set. You will still find that modules with the highestmedianRank
are also those with the lowestZsummary
. I found that the dependency on the module size is mostly true when the module includes less than 100 genes.tagging:kevin and Andres