Hi all. Sorry in advance for any gross mistakes. I'm a novice in this field as is clear from my username.
In most cases of application of Weighted Gene Co-expression Network Analysis, I see that the authors assess the correlation of module eigengenes (which is a numeric variable) with various categorical variables (such as disease status). Check the plot in the following link as an example:
However, I'm not sure how this correlation is assessed. I have searched on various forums and could not come up with a standard approach. My questions:
- I have seen people recommending the utilization of linear regression analysis for this purpose with the dependent variable being the eigengene expressions and the independent variable being the categorical variable, and then using the square root of the R-squared as a measure for association similar to the Pearson correlation coefficient. Is this method acceptable? and if yes, then how do we determine if the correlation is positive or negative?
- I have seen others saying it is possible to use logistic regression (with the categorical variable as the dependent variable and the eigengene as the independent variable) for this purpose. If this is possible, where do we get correlation coefficients from?
I have also seen people saying it is OK to numerically code the categorical variable (e.g., treated = 1 and non-treated = 0) and then use the Pearson correlation. I suspect that this is the method used in the papers I see every day (am I right?). But is this statistically sound? As far as I know, the Pearson correlation determines if a variable increases or decreases when the other variable increases or decreases. However, in this case, 0 and 1 only code for categories and do not represent an increase or decrease.
Is there any other standard approach used for this purpose that I'm not aware of? I have seen people recommending other approaches for assessing correlation of categorical and continuous variables (e.g., point-biserial correlation) but I doubt these are the methods used in the WGCNA literature.
Thanks in advance for your time and advice