Hi all, I am using wgcna for coexpression analysis of time series data. I have 48 samples in total (16 time points with 3 replicates). I have identified different modules based on gene correlation across all samples. After this I want to associate these modules to trait of interest. I have categorized these 16 time points into 5 time zones so I want to see which are the different modules which are associated with these 5 time zones. My trait data is:
samples timezone1 timezone2 timezone3 timezone4 timezone5
dark11 1 0 0 0 0
dark11_1 1 0 0 0 0
dark9 1 0 0 0 0
dark7 0 1 0 0 0
dark7_1 0 1 0 0 0
dark5 0 0 1 0 0
dark5_1 0 0 1 0 0
and so on
I want to ask if I am doing it right? or is there any other way to associate the traits to modules in categorical data? After identifying the modules related to every time zone, I want to identify intramodular connectivity and hub genes.
Thanks for the reply Kevin! Is the pink_module in your command contain only the genes names or expression values of the genes? and does mydata corresponds to traitdata that I have shown the the table?
In my code,
pink_module
would contain the module values returned by WGCNA, with there being 1 value per sample.Yes,
mydata
would be of the form that you have shown, but also including extra columns for the modules (pink, blue, green, etc)Lot of thanks Kevin, it worked perfectly fine. Sorry to bother you again, but I am getting negative scale independence (y-axis). I have also removed genes which were not suitable for the anlaysis. Does the negative scale dependence showing that my data don't follow the scale free topology or does it shows anything else? Or anything wrong with my data? I have considered softpower threshold = 12 according to WGCNA FAQ.
Dear Kevin,
I'm searching for thread for association between modules and categorical traits and found your answers interesting. Sorry to jump in as this is related to what I am doing. with your following code you suggested,
summary(glm(timezone1 ~ pink_module, data = mydata, family = binomial(link = 'logit')))
is it possible to fit multiple logistic regression adding all the modules as covariates? or we must fit each module at a time because they are independent from each other? my multiple regression output was really strange as z value was 0 for all modules and p-value was 1 for all modules.
Also with the following correlation test
cor(as.numeric(mydata$pink_module), mydata$timezone1) cor.test(as.numeric(mydata$pink_module), mydata$timezone1) ---- How robust / accurate numeric correlation since we had response is binary coded as 1 and 0.
I fitted logistic regression with one module at a time and did the numeric correlation as you suggested, but p-value was different. none of the module was significant from binary logistic regression. However, with numeric correlation, one of my module had p-value less than 0.05.
I am stuck with this. Hope you could provide a bit of insight into my problem.
Kind Regards,
Synat