I have a microarray dataset which I was hoping to be able to process using a gene network construction algorithm, most notably WGCNA. I am having trouble determining if my current dataset is appropriate for network construction.
I have tried a number of different subsets of probes, samples, and also tried to use collapseRows, but I'm finding that the powers I would need to select in order to achieve a Scale Free Topology Model Fit Index of near 0.9 are extremely high- usually a soft-thresholding power of 25 or greater. Comparatively, in the WGCNA tutorials and other material I've seen, common powers are between 6 and 10.
I know that if the Model fit index isn't high, the network won't approximate a scale-free topology and the connectivity will be too high to be useful. However, I haven't figured out what factors in the dataset would be contributing to this. Admittedly, my sample size is small- only 11 samples. However, I didn't see any recommendations for determining minimum sample size, nor any way to calculate that. Does anyone have any sorts of 'best practices' regarding this for WGCNA? Should I go ahead and run through the rest of the WGCNA workflow even if I need to select a power of 30 or so to get a Topology Model Fit Index near the suggested 0.9?
We have a number of datasets we'd like to apply this to, but I'm getting concerned now because we usually only have three biological replicates, and typically only a few conditions to test. If this isn't going to work, I'll need to find a similar method that is more robust to smaller sample sizes, even if it's less effective overall compared to WGCNA.
Thank you!
Adam, AFAIK there are no "best practices" (yet) for this algorithm. The WGCNA documentation encourage you to basically tailor your soft-thresholding power to your data -- but with no guidelines that is not very helpful, I agree.
What happens when you take your data through the workflow? Do you get any module identification? How does module membership vary with different powers? That may be helpful -- you may identify a point at which module membership does not change with changes in power -- and that would be your answer.
In fact, since there are no published best practices, running your data with different powers could be helpful in establishing those practices.
As for the minimum number of samples you mention below -- the minimum n of 12 suggests there may be something in the algorithm that can be modified for smaller data sets, with the caveat that module identification may not be as robust. Just some thoughts.