I am looking for a clear explanation on what WGCNA actually produces as output.
I've seen many papers using it and I actually read both the manual and the paper, but I do still not really understand what exactly these hubs and hubgenes and modules are and in which situation one would use WGCNA.
Could someone please explain for dummies what results this tool actually returns and most importantly, if this is possible, an example in which situation a researcher would use it, so what the biological question could be that motivates to use use WGCNA. Thanks!
The idea behind WGCNA is simply to view correlations between gene expression profiles as edge weigths of a graph. That is the correlation matrix is viewed as the adjacency matrix of a graph in which the nodes are genes and the edge weights are the correlations. From there one could then apply various graph algorithms to characterize the graph. What the output is depends on which algorithm is applied. Modules are another name for clusters and are obtained by applying a graph-based clustering algorithm. Hubs are nodes that have a high degree (i.e. are highly connected).
As to what questions can motivate the use of WGCNA, there are many. For example you could be interested in finding groups of genes that are co-regulated. These might be detected as modules in the graph. If interested in identifying "master regulators" of a process, you may find them as hubs. For examples of use, I suggest to check the literature. Here is one random example and here another one.
Note that the methods in WGCNA are generic and any similarity matrix could in principle be used as input, not just correlations and also that there are other graph algorithms than those available in the WGCNA package.
Another example of using WGCNA is to study the preservation of modules between cancer subtypes. If the modules are low preserved between 2 cancer types it means that the functionality of the genes included within each module is highly impacted by each cancer type.
Very clear, thank you so much!