Hi, I want to construct co-expression network through 71 cancer samples include 31 men and 40 women. My target is investigating the relationship between modules and Traits and finding meaningful modules. So, when I construct Co-expression with 71 sample and 5000 genes, I got below modules which ‘turquoise’ module is so large in comparison with other modules:
Module name yellow turquoise red green brown blue black
No. Genes 219 3747 99 171 272 437 55
So, for the second effort, I constructed 2 distinct networks for 40 women and 31 men separately and I got below results: Modules for 40 Women:
Module name yellow turquoise red green brown blue grey
No. Genes 239 2573 177 206 598 1196 11
Modules for 31 Women:
Module name Magenta turquoise red green brown blue black
No. Genes 517 1548 309 313 439 1455 242
AS you see, when I separate women patients from Men, I get better and more logically clusters. So, in this situation, which strategy is better? 1- In the preprocessing step, I consider sex status as a binary variable and face it as a batch and remove it by Combat() in my dataset. 2- I don’t consider sex status for constructing network and continues my analysis by 71 samples without any attention to sex status. 3- I use sex status as a Trait and investigate the relationship between module eigengenes (ME) and Sex status. 4- I construct 2 separate networks for women and Men and find consensus modules between them and investigate the relationship between consensus modules and Traits.
I really appreciate it if anybody shares his/her comment with me and guide me. Best