Hello, I am performing WGCNA to relate genes with specific traits. While constructing signed network by specifying network type and TOM type as "signed", I couldn't decide the appropriate power. So I constructed multiple networks with 7 different powers (6,7,8,9,10,12,14) and further created heat-map of module-trait correlation for each of the power. My R-square values in network construction are increasing from 0.5 (power 6) to 0.8 (power 14) but mean connectivity falls after power of 7.
Surprisingly I'm getting same correlation (0.8) but with different number of genes in respective modules for each power. Why is the correlation values same in each case and which power should I consider now?
Thanks
I don't get your question so help me to understand.
You are wondering: 1) why the number of genes in modules changes with different power thresholds; 2) why the module-trait correlation values do not change with different power thresholds.
Sorry for the confusion, my question is 1) If the correlation values are same for all power thresholds, the number of genes in the module at different power varies. so now i don't know how many genes to consider. for example I am getting 86 genes(power 6), 108 genes(power 8), 156 genes(power 10), and 87 genes(power 14), all with the same correlation of 0.8. There is no particular trend of genes obtained with increase in power thresholds. Hence, I can't decide how many genes are truly correlated or if there is a false clustering. 2) yes, so if the power thresholds are increasing from 6 to 14 with corresponding R-square (scale independence) increasing from 0.5 to 0.8, then why is the correlation same in all the cases? Thus, I am not able to decide which power threshold I should proceed with.
First, you should use the soft-thresholding power corresponding to a scale-free topology fit index above 0.8 and with mean connectivity below 100. A section in the WGCNA faq explains how to choose the soft-thresholding power.
Second, the reason why you always get the same (exact same?) correlation of 0.8 in the module trait relationship could be that no matter what power you pick, the hub genes in that particular cluster do not change.
Third
Would you mind sharing the output of the
pickSoftThreshold()
function? How many samples do you have?Thank you for the suggestion. The correlation values are not exactly same, but varies in the range of 0.82 (power 6) to 0.84 (power 14). I have total 463 samples and 14k genes. The graph and fit indices values are attached below.
The data looks good to me. Use 14 and go ahead with the analysis. The differences you see in the module size are mostly caused by non-hub-genes. Because these genes do not really contribute to the module eigengenes, hence you do not see a significant change in the correlation values of the module-trait relationship.
Thanks a lot for helping me out. I understand the explanation and will proceed ahead with power of 14.
perhaps additional sample QC and sample/gene filtering is needed to clean up your data before optimizing your WGCNA analysis?
I already performed a heterogeneity analysis before going for WGCNA and removed outliers. Also I preprocessed the data according to WGCNA guidelines. So now I have a good homogeneous data.