Question

OMA standalone, parallelization

0

Entering edit mode

4.1 years ago

mar.ark.parr ▴ 40

Hi all,

I have the set of yeast genomes (136 genomes, ~9Mb and 6000 sequences each) and would like to find group of orthologous genes in them using OMA standalone.

Due to the size of data-set I am trying to arrange the parallelization of OMA run using the cluster with SGE scheduler.

First, I run oma -c to convert the databases.

Then I submitted the jobs using command qsub -t 1-32 -cwd run_oma.sh the run_oma.sh contains two lines:

export NR_PROCESSES=32
oma

Then I see that all jobs are running, however, I see very big estimated remaining times which haven't decreased within 6 hours (~ 150000 h). So I am not sure that the run is parallelized properly.

Can anyone help to find out what is happening and how can I speed up the calculation?

Kind regards Marina

oma parallelization sge qsub • 1.3k views

ADD COMMENT • link updated 4.1 years ago by Adrian Altenhoff ★ 1.1k • written 4.1 years ago by mar.ark.parr ▴ 40

1

Entering edit mode

I would recommend checking the node your job is running on to see if it's using as many processes as you expect it to be using.

ADD REPLY • link 4.1 years ago by Dave Carlson ★ 2.1k

0

Entering edit mode

Thank you for the answer, Dave! In the output of qstat I see that there are 32 processes running. There are 32 lines like this:

job-ID prior name user state submit/start at queue slots ja-task-ID

7533430 0.55500 run_oma.sh - r 03/03/2021 11:10:13 all.q@fr 1 1

...

7533430 0.55500 run_oma.sh - r 03/03/2021 11:10:13 all.q@ze 1 32

ADD REPLY • link 4.1 years ago by mar.ark.parr ▴ 40

score 3 · Answer 1 · 2021-03-03

3

Entering edit mode

4.1 years ago

Adrian Altenhoff ★ 1.1k

Hi Marina,

these estimates are rather rough, each process estimates this based on only the work it is doing, so they could be quite a bit off. use oma-status to get a better sense how far you are in the overall process. Also, depending on the performance of the filesystem, it might be wise to regularly run oma-compact to summarize some of the result files.

Cheers Adrian

ADD COMMENT • link 4.1 years ago by Adrian Altenhoff ★ 1.1k

0

Entering edit mode

Hi Adrian, thank you for the answer! Unfortunately according to oma-status output it seems that the estimations are quite reasonable:

Summary of OMA standalone All-vs-All computations: Nr chunks started: 32 (0.01%) Nr chunks finished: 1151 (0.42%) Nr chunks finished w/o exported genomes: 1151 (0.42%)

ADD REPLY • link 4.1 years ago by mar.ark.parr ▴ 40