I'm running a WGCNA analysis on ~50,000 transcripts with the blockwise modules command:
modules = blockwiseModules(wgcna_data, maxBlockSize = 10000, checkMissingData=TRUE, minModuleSize = 20, deepSplit = 4, mergeCutHeight = 0.25, power = power, networkType = 'signed', replaceMissingAdjacencies=FALSE)
And I end up getting around ~250 modules of genes, with some modules containing thousands of genes. However, I'd like to get more specific modules - i.e., to break up these large modules into whatever submodules in the clustering tree might exist within them to explore their expression substructures. I attempted increasing deepSplit to 4 and mergeCutHeight to 0.25, but these parameters did not substantially increase the number of modules. Is there some way I can tell the clustering algorithm to be more stringent with its module inclusion? Would it be possible to perform a different clustering algorithm on the dendrograms which allows for more stringent module cutoffs?
You can specify a max number of genes per module to avoid big modules. You could also apply the same mergeCutHeight algorithm to a given module.
What's the parameter for max genes per module? I don't see it in the documentation.
I could only find the minimum in minModuleSize, I might have confused with another option. Sorry.
Late to the party unfortunately, I think you should first try to reduce the number of genes/transcripts that are given as input in the first place, removing low expressors and transcripts that do not change of the conditions. It is unlikely that 50k transcripts are acting concertedly under any single biological condition, by removing the unaffected genes, you can probably get rid of a lot of random correlations.
The authors do recommend to remove the "noisier" genes, either by mean expression or variance, but they also recommmend to not filter by differential expression - see FAQ question 2.
"filter by mean expression or variance is a matter of debate" Which one is better? Or which one do you use? I usually filter genes by
keep <- rowSums(cpm(DataExpr)>1) >1
Is this good enough?