Question

Module preservation analysis using WGCNA

0

Entering edit mode

3.2 years ago

seta ★ 1.9k

Dear Friends,

I have the gene expression microarray dataset (about 17000 genes) of about 400 cancer samples with different cancer subtypes (say A, B, C, and D) and about 30 control samples, which were collected from the different datasets (meta-analysis). Here, I used only cancer samples and considered 50% of genes with the highest variance as input for WGCNA and selected signed network type. I considered subtypes as traits (binary traits) and used WGCNA to find the possible modules associated with traits and corresponding hub genes. I found that some modules are significantly associated with just subtypes A and B. In the next step, I applied module preservation analysis to examine if the associated modules with subtype A are preserved or non-preserved in other subtypes. So, I considered subtype A as a reference and other subtypes as a test and conducted the preservation analysis for each subtype and the reference, separately. As almost expected, associated modules with subtype A are non-preserved in other subtypes. However, I have some questions in this regard; kindly share with me your suggestions.

Are the above working steps logical in your view? is it reasonable to do module preservation analysis in the same dataset?

I’m also thinking of doing module preservation analysis with control samples as a reference and each subtype. But, I’m not still sure about it since the sample size of the control is almost small (28 samples) and some modules will be obviously non-preserved between control and each cancer subtype. Please kindly advise me with your helpful comments and suggestion.

Regarding the module preservation analysis, as I read, the Zsummary parameter has a strong dependence on module size, so I used the medianRank parameter and considered modules with medianRank ≥ 8 as non-preserved modules, is it acceptable?

Thank you in advance

cancer expression WGCNA samples gene • 2.1k views

ADD COMMENT • link updated 3.2 years ago by andres.firrincieli 3.8k • written 3.2 years ago by seta ★ 1.9k

1

Entering edit mode

Hi seta,

I can't comment on the first point as you have a fairly complex experimental design for a differential co-expression analysis. In this work the authors compared two reference networks (susceptible and resilient) against a control dataset. Each dataset included expression data from 4 brain regions (PFC, NAC, AMY, VHIP) and time points (early, late, stress-primed). Perhaps this could help.

Regarding the Zsummary and medianRank, ultimately you can use both. Keep in mind that medianRank only ranks modules from the less to the most preserved one, but is not going to tell you which module is not preserved in the test set. You will still find that modules with the highest medianRank are also those with the lowest Zsummary. I found that the dependency on the module size is mostly true when the module includes less than 100 genes.