Hi!
I'm trying to identify hub genes within my significant modules. Similarly to this post: how to choose hub genes in wgcna I followed the tutorial and used a MM > 0.8 and GS > 0.2 (absolute values).
However, I saw this post: How to find the hub genes in the gene co-expression network constructed by WGCNA which recommends using the chooseTopHubInEachModule function.
I noticed when I did the filtering scheme from the tutorial, I didn't find hubs in some modules, and then found a lot in others (~150 hubs/~350 module genes). I also imposed a p-value cutoff for the gene significance (p<0.05), like here https://support.bioconductor.org/p/133024/ My question is, it finding a high number (seems like, anyway!) of hubs typical? Could anyone provide some insight into the pros/cons of these methods for hub selection?
I see papers using the >0.8 MM, >0.2 GS thresholds, for example: https://onlinelibrary.wiley.com/doi/full/10.1002/ehf2.13827
Thank you!
It is a very common thing. Typically, networks from RNA-seq data (or many other biological data) are not scale-free (link; link) so the hub, especially for large modules, actually consists of hundreds of highly interconnected genes.