Question

Clarification on WGCNA Module-Trait Correlations, Interpretation, and Functional Annotation

0

Entering edit mode

12 months ago

Shaheer Syed • 0

Subject: Clarification on WGCNA Module-Trait Correlations, Interpretation, and Functional Annotation

Hello All,

I recently began a postbac a few months ago and am working on a project involving bulk RNA-seq data from over 50 patient samples (~12,012 genes). My plan is to first correlate WGCNA module eigengenes to various clinical traits and then use the top WGCNA-to-clinical trait pairings to correlate with pre-defined modules for functional annotation. As I delve deeper into this analysis, I want to ensure that my understanding of certain key concepts and interpretations within WGCNA is accurate. I would greatly appreciate your insights on the following points:

Module Eigengenes:

My understanding is that a module eigengene represents the first principal component of the gene expression data for a given module, capturing the overall expression pattern of the genes within that module. Is this a correct and comprehensive understanding?

Signed vs. Unsigned Networks:

Signed Networks: In a signed network, the sign of the correlation between genes is preserved, meaning that modules consist of genes with consistent expression patterns (modules are made up of genes that are all positively correlated with one another, as per the WGCNA glossary). When correlating module eigengenes to clinical traits, a positive correlation suggests that an increase in the expression of the genes within that module is associated with an increase in the trait. Conversely, a negative correlation suggests that an increase in gene expression is associated with a decrease in the trait. Is it safe to make these statements regarding gene expression?
Unsigned Networks: In unsigned networks, the absolute value of the correlation is used, so both positive and negative correlations between genes are treated similarly. This could result in modules containing genes with opposite expression patterns. I assume that in this case, the sign of the correlation between a module eigengene and a clinical trait might be arbitrary and not reflect consistent gene behavior within the module.

Could you please confirm if my understanding of these network types and their implications is accurate? Are there any nuances or additional considerations that I should be aware of?

Correlating Module Eigengenes to Disease Activity Measures:

I am working with different types of disease activity measures, which have opposite scales. I’ve provided two examples below:

Disease Activity Measure 1 (DAM1): A higher score indicates more severe disease activity, while a lower score indicates less disease activity.
Disease Activity Measure 2 (DAM2): This measure is inversely scaled compared to DAM1, where a higher score indicates less severe disease activity (better health), and a lower score indicates more severe disease activity.

My Current Interpretation:

In a signed network:

A positive correlation between a WGCNA module eigengene and DAM1 would indicate that as disease activity worsens (higher DAM1 score), the expression of the genes within that module increases.
A negative correlation would suggest the inverse, where disease activity decreases (lower DAM1 score) as the expression of the genes within the module increases.

For DAM2:

A positive correlation (without adjustment) would imply that as disease activity decreases (higher DAM2 score), the module’s gene expression increases, potentially suggesting involvement in recovery or protective processes.
A negative correlation (without adjustment) would suggest that as disease activity worsens (lower DAM2 score), the module’s gene expression increases.

Concern:

Given that DAM2 is scaled oppositely to DAM1, interpreting the correlation directly could be counterintuitive. To maintain consistency, I’m considering correcting the sign of the correlation for DAM2 and other like it (by multiplying the correlation by -1). This adjustment would allow for a consistent interpretation across both measures:

A positive correlation with both DAM1 and the adjusted DAM2 would indicate that the expression of the genes within that module increases with worsening disease activity.
A negative correlation with both measures would suggest that the module is associated with improving disease activity.

Additional Plan for Functional Annotation:

After identifying the top WGCNA-clinical trait pairs, I plan to functionally annotate them by correlating these WGCNA module eigengenes to pre-defined modules (such as BloodGen3) that have established functions. My approach is to assess correlations between the WGCNA module eigengenes and the annotated module eigengenes. I assume I will need to carefully interpret the sign of the correlation in this context as well. Ideally, I am trying to generate a heatmap similar to Figure 1 here: https://pubmed.ncbi.nlm.nih.gov/29908154/

Clarification Needed:

Is it appropriate to correct the sign of the correlation for DAM2 in this context? My aim is to ensure that the biological interpretation of the correlations remains consistent across both disease activity measures.
What are your thoughts on the biological implications of the correlation before and after sign correction? Specifically, would the unadjusted correlation with DAM2 still provide meaningful insights, or is sign correction necessary for clear interpretation?
Does my understanding of how to interpret module eigengenes and their correlation with clinical traits in both signed and unsigned networks align with best practices? Are there any additional nuances or best practices I should consider when conducting this analysis?
A previous postbac in the lab mentioned the arbitrary assignment of correlation signs in WGCNA, but I don’t know why they claimed this. Could anyone clarify what this means? How should this influence the interpretation of correlations when using them for functional annotation against traits and pre-defined modules?
Is there anything I said here that is completely wrong? I am still trying to piece things together about my project and WGCNA and any insight will help.

I want to ensure that I’m applying WGCNA correctly and interpreting the results in a biologically meaningful way. Any expertise and feedback would be invaluable in guiding this process.

Thank you!

Best regards,
[S]

WGCNA • 1.7k views

ADD COMMENT • link updated 12 months ago by LChart 5.0k • written 12 months ago by Shaheer Syed • 0

score 0 · Answer 1 · 2024-08-09

Is it appropriate to correct the sign of the correlation for DAM2 in this context?

if DAM2 is an "established" metric (like, in Huntington's, striatal volume decreases while ventricle volume increases) then there's no good reason to invert it, as it's known that DAM2 and DAM1 are inversely related. If DAM2 is novel or arbitrary, sure you can reverse the signs.

What are your thoughts on the biological implications of the correlation before and after sign correction?

Biological interpretation should not change as signs change: "More abundant as condition worsens" should be an invariant.

Does my understanding of how to interpret module eigengenes and their correlation with clinical traits in both signed and unsigned networks align with best practices?

The definition of module eigengene does not change for signed and unsigned networks - simply the first principal component of expression. With 50 patient samples, best practices states, essentially, that you should not use unsigned networks. That should simplify your analysis. (To be sure: signed weighted correlation and signed topological overlap).

A previous postbac in the lab mentioned the arbitrary assignment of correlation signs in WGCNA, but I don’t know why they claimed this. Could anyone clarify what this means?

Eigenvectors are unique only up to sign, that is if v is an eigenvector then so is -v, and which one you find depends on the particular solver. However, by default, moduleEigengene has align="along average" which computes the correlation between the eigengene and the module average expression, and ensures that it is positive. This makes the definition of "module eigengene" unique, though potentially unstable in unsigned networks when module averages may be near 0. (again: with N=50 don't bother with unsigned networks).

Is there anything I said here that is completely wrong?

Not at first glance. Sounds like you've carefully read the Horvath/Langfelder/Oldham papers.