Question

OMA standalone estimating memory use for bacterial genomes

1

Entering edit mode

4.9 years ago

Andrew Watson ▴ 40

I am trying to estimate memory requirements to run the final steps OMA standalone (after the ALL-vs ALL) on a HPC (slurm), without HOG inference.

From the benchmarks in the manual, the suggested formula is 400MB * pow(nr_genomes, 1.4).

This works for the described metazoan dataset (60 metazoas have been successfully computed using 120GB).

The requirements for bacterial genomes are reported to be lower (50GB for 60 genomes).

I adjusted the formula to match that, so it would be ~166MB * pow(nr_genomes, 1.4).

Does this match other peoples experiences working with bacterial or archaeal genomes?

Would leaving out the HOG inference help me to reduce those requirements significantly?

I was hoping to use a dataset of ~400 genomes but may have to rethink that if it will need around 730Gb of memory.

oma orthologs • 1.1k views

ADD COMMENT • link updated 4.9 years ago by Adrian Altenhoff ★ 1.1k • written 4.9 years ago by Andrew Watson ▴ 40

score 3 · Accepted Answer · 2020-09-28

3

Entering edit mode

4.9 years ago

Adrian Altenhoff ★ 1.1k

Hi Andrew,

I don't think that the scaling behavior of OmaStandalone is the same for bacterial genomes than for eukaryotes. The formula you use gives a rough idea from a few datapoints and are also rather conservative. The memory consumption depends a lot on the size of the genomes and how related they are to each other, i.e. the number of homologs and orthologs.

I'm very confident that you could run 400 bacteria with less than 100GB memory. Would be nice if you can post the amount of memory it required in the end once you're done with the computations.

Deactivating the HOG computation will not significantly reduce the amount of memory (at least not for the bottom-up variant of the algorithm).

Best wishes Adrian

ADD COMMENT • link 4.9 years ago by Adrian Altenhoff ★ 1.1k

0

Entering edit mode

Hi Adrian,

Great, thanks for the information. I had thought/hoped that they might scale differently.

I was mainly checking to get an idea of a starting point to discuss with the HPC admins. A single core with high memory and for a longer than usual wall-time doesn't fit neatly into any of their standard queues.

I'll request 100GB as a jumping-off point and let you know how I get on.

Best wishes, Andrew

ADD REPLY • link 4.9 years ago by Andrew Watson ▴ 40