RepeatModeler cannot be run in parallel
3
0
Entering edit mode
8.2 years ago
tlorin ▴ 370

Dear all,

I am trying to run RepeatModeler with multiple processors with the -pa option.

I downloaded RepeatModeler (last version available here) and installed it properly (it runs perfectly but for a reaaaaaaaaaaaally long time).

The documentation here and here says that I should be able to do so: am I missing something?

My command is RepeatModeler -pa 3 -database mydatabase and gives Unknown option: pa

Many thanks!


EDIT: When I run RepeatModeler -h it seems that this option is not supported.

NAME
    RepeatModeler - Model repetitive DNA

SYNOPSIS
      RepeatModeler [-options] -database <XDF Database>

DESCRIPTION
    The options are:

    -h(elp)
        Detailed help

    -database
        The prefix name of a XDF formatted sequence database containing the
        genomic sequence to use when building repeat models. The database
        may be created with the WUBlast "xdformat" utility or with the
        RepeatModeler wrapper script "BuildXDFDatabase".

    -engine <abblast|wublast|ncbi>
        The name of the search engine we are using. I.e abblast/wublast or
        ncbi (rmblast version).

SEE ALSO
        RepeatMasker, WUBlast

COPYRIGHT
     Copyright 2005-2010 Institute for Systems Biology

AUTHOR
     Robert Hubley <rhubley@systemsbiology.org>
     Arian Smit <asmit@systemsbiology.org>
RepeatModeler RepeatMasker • 4.3k views
ADD COMMENT
0
Entering edit mode

Hi!

I am struggling the same problem, any clue of how to solve it?

Thanks in advance,

Cristina

ADD REPLY
1
Entering edit mode
8.2 years ago
microfuge ★ 1.9k

Are you sure you have downloaded the 1-0-8 and not accidentally the 1-0-7. I have not used these, but the release notes for the 1-0-8 say that the parallel functionality was added in 1-0-8.

ADD COMMENT
0
Entering edit mode

Haha I checked and I did install version 1.0.8 :)

ADD REPLY
1
Entering edit mode

Ah Ok :). I got suspicious because it mentions "Copyright 2005-2010" in your post, whereas the 1-0-8 RepeatModeler script mentions "Copyright 2005-2014" . Maybe this information is taken from somewhere else in the whole program.

ADD REPLY
0
Entering edit mode

You were indeed right with your assumption. Op did not use the latest version. When using version 1.0.8, the following is shown:

-pa #
        Specify the number of shared-memory processors available to this
        program. RepeatModeler will use the processors to run BLAST searches
        in parallel. i.e on a machine with 10 cores one might use 1 core for
        the script and 9 cores for the BLAST searches by running with "-pa
        9". 
 ...
 Copyright 2005-2014 Institute for Systems Biology

According to the documentation, only blast searches are parallelized, but not helpers like edgeredef, elemrdef, etc.

ADD REPLY
0
Entering edit mode
7.1 years ago
Michael 55k

It seems like the problem is eleredef which does not support parallelism. Blast does use multiple processes, but it seems to be eleredef that takes so long. Round after round it takes longer and I have never seen it use more than 50% CPU.

 PID USER   PR  NI    VIRT    RES    SHR    S  %CPU %MEM     TIME+ COMMAND
 123  md    20   0    3118032 2.970g   612  D   5.0  0.1   2877:59 eleredef

The helper scripts do not support parallelism, that means that -pa will only affect blast, but will have no effect on edgeredef, elemredef, etc...

It would be a very nice feature to parallelize edgeredef and friends, e.g. by using GNU parallel, but to accomplish that one would have to understand what they are doing, the man pages don't provide much information, so we'd have to take a look into the code.

edgeredef
usage: fam_def seq_list start
 where seq_list is the list of sequence names, start is the index of the element to start defining families.  The latter one is optional.

eleredef
usage: redef seq_list start clan_ct
where seq_list is the list of sequence names, start is the index of the element to start redefining, and clan_ct is the number to start counting the number of clans.  The latter two are optional.

That is all I know about what these programs are doing. They have a seq_list parameter that could be worth a try to split into smaller packages, however I suspect that that would affect the result.

ADD COMMENT
0
Entering edit mode
6.3 years ago
nehleen11 • 0

dear i had the same problem .. bt later figured it out ... you havent used the engine option which is mandatory ... 1st of all I also did the same mistake later i was able to run the program when i specified the engine

ADD COMMENT

Login before adding your answer.

Traffic: 2662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6