We are running WGCNA on ~90,000 genes in a single block with 48 threads and 192GB of memory using the blockwiseModules
function.
WGCNA takes several dozen hours to compute the topological overlap matrix. We thought that 192GB would be sufficient for the analysis. But WGCNA chokes when exporting the TOM.
How can we estimate the memory required for blockwiseModules
to complete successfully? We have included the output below:
..Working on block 1 .
TOM calculation: adjacency..
..will use 48 parallel threads.
Fraction of slow calculations: 0.000000
..connectivity..
..matrix multiplication (system BLAS)..
..normalization..
..done.
..saving TOM for block 1 into file output/100000/wgcna/TOM-block.1.RData
....clustering..
Error in fastcluster::hclust(as.dist(dissTom), method = "average") :
Memory overflow.
Calls: blockwiseModules -> <Anonymous>
Execution halted
I have seen that article before. The operative line is here:
By that calculation, 80k transcripts require 64GB of memory. Imagine our surprise when moving to ~90k transcripts suddenly overloads 192GB.
These heuristics don't seem reliable. Is there a better way to guess at the required memory? This would inform the choice of node type that we choose before running WGCNA.
I believe the memory used will be system-dependent, and also dependent on your version of R (its under constant development behind the scenes). You may consider trying to reduce your dataset by, for example:
Finally, you may try the Bioconductor support site ( https://support.bioconductor.org/t/Latest/ ), where the WGCNA developer is more active.
As I think about it, technically, one could write the correlation matrix to disk as the calculations are under way, and, in this way, save on memory when this [the correlation matrix] is being produced. You would then later just have to read this matrix back into your R session after, but the max memory required would be less.
Edit 29th August 2019: in fact, I have learned that this (what I wrote above) is precisely how bigcor does it