Hello,
I am having some trouble understanding how to choose a method for gene selection to identify genes from top WGCNA modules that most correlate with a clinical trait. I understand that WGCNA has a built-in feature called gene significance (GS), which is just the Pearson correlation between the gene and the clinical trait, and you could set a GS cut-off to identify genes from the modules that most correlate with your clinical trait.
However, I understand that you can also use LASSO for this, where you identify genes that have non-zero coefficients. From what I have read so far, LASSO does remove genes that DO correlate with the clinical trait but have a weak correlation or are redundant in that they share similar correlation pattern with another gene. My understanding is that LASSO is better for when you have a large number of genes.
Am I missing anything? Why would one use LASSO over a GS cut-off for identifying top genes?
I tend to lean towards more network-based centrality approaches (like kleinbergs or betweenness) when trying to find influential genes in a module. As an intuitive way to approach it, centrality measures would find genes that are "most connected" to other genes in the module. Here's a brief example using igraph that is conceptually similar to the WGCNA function
chooseTopHubInEachModule
(which uses the adjacency matrix instead of the TOM, and a different centrality measure than Kleinbergs). I prefer exporting it to igraph to get more control over the analysis and to have more plotting options.tom
is your topological overlap matrix, andmodule_features
are the genes in your correlated module.Igraph has a whole bunch of network measures to choose from if you prefer a different method.
Disclaimer, I am not familiar with LASSO but... if LASSO really drops a gene just because it shares a similar correlation pattern with another gene then, I don't think this will work very well in this context. Genes in modules, especially the hub genes are supposed to share similar correlation patterns between each other. That is how WGCNA works.
Maybe this post could be of some help