Hello all,
I have a dataset I am performing WGCNA analysis using the . The tutorial materials on Steve Hovarth's website are excellent but one question I have not found an answer to is this: When you have a trait that is not a continuous variable but rather a binary state such as diseased/not diseased and you are trying to correlate the modules to it, how should you encode the trait?
Should you assign a 0 and a 1 for non-diseased and diseased respectively? What about a 1 and a 2? Would both work equally well?
When so encoded how would the step where you correlate modules to traits work?
If you reversed this coding so that non-diseased was encoded as 2, and diseased was coded as 1 would you expect modules that negatively correlate to being diseased show up as a positive correlation?
Also, what if you had three conditions: Control, Traumatized, Diseased. Would encoding them as 1,2,3 make sense? Or would you want to limit your test to just those samples that were traumatized and diseased, if you were primarily interested in traum vs. diseased comparison.
I'd appreciate any advice you can offer!
For calculating the correlation p-value, the tutorial uses Student's t-test [corPvalueStudent()]. Should this be modified while using a categorical/binary variable?
Hi Arindam,
you can use the
corPvalueStudent
function