I am trying to estimate memory requirements to run the final steps OMA standalone (after the ALL-vs ALL) on a HPC (slurm), without HOG inference.
From the benchmarks in the manual, the suggested formula is 400MB * pow(nr_genomes, 1.4).
This works for the described metazoan dataset (60 metazoas have been successfully computed using 120GB).
The requirements for bacterial genomes are reported to be lower (50GB for 60 genomes).
I adjusted the formula to match that, so it would be ~166MB * pow(nr_genomes, 1.4).
Does this match other peoples experiences working with bacterial or archaeal genomes?
Would leaving out the HOG inference help me to reduce those requirements significantly?
I was hoping to use a dataset of ~400 genomes but may have to rethink that if it will need around 730Gb of memory.
Hi Adrian,
Great, thanks for the information. I had thought/hoped that they might scale differently.
I was mainly checking to get an idea of a starting point to discuss with the HPC admins. A single core with high memory and for a longer than usual wall-time doesn't fit neatly into any of their standard queues.
I'll request 100GB as a jumping-off point and let you know how I get on.
Best wishes, Andrew