I am using WGCNA for the first time to identify gene co-expression modules across a time course with 40+ samples (RNA-Seq). I've removed lowly expressed genes and focused the analysis on the 15,000 most dynamically expressed genes. The free-scale topology and other indicators (sample clustering, etc) look good and similar to other example datasets. The network is signed.
When I plot the "representative" merged eigengene module expression I get eigengene profiles for each module that agree well with my biological expectations. Nevertheless, when I look closer to investigate specific genes of interest in any particular module their expression pattern is completely discordant and sometimes entirely opposite to that of the "representative" eigengene.
Is this expected and if it is to what level? What could be causing this discordance? What parameters can I modify to make the coexpression clusters "tighter"? Is that necessary?
I suspect one of the factors that might be resulting is this pattern is eigenmodule merging. I used a merging distance threshold of 0.2 to merge eigenmodules but it might be a bit too strict although I'm not sure if there is a better way to choose this threshold. I am in fact expecting large numbers of modules as the conditions I'm comparing are fairly biologically different.
Find the module dendrogram below, as well as the merging cut-off (in red).
Any insights/pointers/suggestions would be greatly appreciated.
Thanks!
Hi Liz, and Devon,
I would be interested in hearing how this experience ended up, since I came across a similar problem. I would find it very useful to share experiences with WGCNA, since in the package many of the functions have options that are not very thoroughly documented.
My experience: I constructed a signed network, extracted modules with dynamicTreeCutting, then wanted to merge closely related modules (since typically dynamicTreeCutting tends to give a large number of highly correlated modules).
I proceeded with the default merging approach, but in subsequent analyses on the modules (sanity checks, let's call them) I figured that the merged modules were a bit odd; in particular what I found worrying was a very poor correlation between gene significances and module memberships for given trait associations, which is what I was ultimately interested in (in addition, global in-module expression patterns weren't very consistent, as I think I understand you experienced as well).
Going back and forth with distinct parameters and reading through all function descriptions, I discovered the possibility of merging modules using 1 - abs(correlation) as distance measure, and this seems to give results that are much more robust (at least in terms of expression patterns and gene significance/module membership correlations).
In my mind this makes sense, given that we are working with signed networks, but I never really clarified whether this makes sense only to me or it's truly a valid option.
I hope to hear more about this!
Marge
Hi Marge, hopefully Liz will reply with how this worked for her. For my part, I too have needed to monkey around with things a bit to get seemingly meaningful results (and then the modules that correlate with what I'm interested in generally end up being just a superset of DE genes found with DESeq2, so I don't really use WGCNA much anymore). Intuitively, at least, if a module is correlated with a trait, then I too would generally expect to see a rough correlation between module membership and significance. I think that I had previously used 1-abs(correlation) as the distance measure myself, so I guess that made sense to me too (for whatever that's worth!).
Hi Marge,
As I said in my response to Devon I ended up merging at a much lower threshold (0.1), which effectively resulted in only in three eigengenes merging (tan, brown and salmon in the dendogram above). In fact I get very meaningful results with this approach. I do not have correlations to traits (as this is a time course) so I'm more interested in eigenmodules that are overexpressed in any particular time frame.
In my hands the eigengenes expression pattern correlates beautifully with the biological expectations and shows differential gene enrichments that make sense in our biological framework. What I'm doing now to get a hang on what to focus on in the network is doing a mix of hub gene identification in addition to differential gene expression layered on top of each module, to see if there are any particular genes that we can focus on as drivers of these gene expression changes.
I have also done some testing in a different system where we have way less samples (~30) and the pattern does not make much sense. It seems WGCNA might perform much better the more samples you have, although the gene expression measurements were done in two different platforms so that might also be affecting the results.
Hope this helps,
Liz
Dear Liz,
Thanks a lot for feedback, it definitely helps!
Marge
Hi Devon,
Thanks for the feedback. To start with, it's great to hear that I am not the only one that stumbled upon the abs option ;-)
The appeal of WGCNA over differential expression is (at least for me) the possibility (in principle) to extrapolate suggestive regulatory links from the connectivity properties of the modules. Also, the module context seems the perfect context to test pathway enrichments and assign function with a guilt-by-association approach. This in theory: in practice I have to honestly that I've not seen so far very clean and clear results (and it's been a while, both in terms of time and in terms of data).