How To Parallelize Codeml With Python
4
5
Entering edit mode
12.8 years ago
Biojl ★ 1.7k

Hi,

I was wondering if someone already succeded to do some PAML (Dn/Ds) / branch site tests using multicore computers. I normally use http://www.parallelpython.com/

But the problem is that the CODEML programme creates some temporary files in the folder where is located, so I can't run more than one at a time. It may be possible to change the working path every time we run a CODEML copy?

Any help will be welcome :)

codeml paml python parallel • 8.2k views
ADD COMMENT
4
Entering edit mode
12.2 years ago
mgalactus ▴ 780

I've successefully used the python multiprocessing module.

You can find the working example here: https://gist.github.com/3743820

ADD COMMENT
1
Entering edit mode

Do you have a clear example on how it works? I can't figure it out from the code, nor is the example working

ADD REPLY
1
Entering edit mode

Yeah, you are right, the script lacks some explanations:

  • First of all, you need to prepare a directory with all your alignments in phylip format (something like GENE_ID.phy)
  • Then you have to create a directory called "tmpcodeml" (in the same directory as the previous one) in which you must prepare one file for each one of the alignments in the previous directory (named like GENE_ID.phy.ctl): these files are the codeml parameters files (you can easily create them using biopython).
  • Launch the script as follows: python parallelPAML.py -a FIRST_DIRECTORY -t TREE.nwk -r 7 (the -r option indicates how many CPUs you want to use)
  • At the end your tmpcodeml directory will contain a bunch of output files (.out, .rst and .rst1)

Sorry if it looks a bit convoluted, but it was written in a rush :)

ADD REPLY
0
Entering edit mode

This is really helpful. Just a quick question. Do I have to change NSites value at line number 36 in the script to run M7 and M8 models?

Thanks

ADD REPLY
2
Entering edit mode
12.8 years ago

The PAML code is very convoluted, so don't expect to be able to go in and fix the program to Do the Right Thing.

I would write a small wrapper to codeml that created a temp directory in which codeml was run. That way, you should be able to run as many codeml instanes as you want in parallel.

ADD COMMENT
0
Entering edit mode

Basically that's what I'm asking in the question...

ADD REPLY
0
Entering edit mode

Yes, a python or shell script wrapper would be useful in this instance. I hope that either these jobs are small or you're running these multiple instances on a cluster/GRID system.

ADD REPLY
2
Entering edit mode
ADD COMMENT
0
Entering edit mode

Thanks! Great resource

ADD REPLY
1
Entering edit mode
12.8 years ago
Liam Thompson ▴ 140

I do not know of any way to have parallel PAML computations on one dataset. I spent a fair amount of time looking into this and was assured by computational experts that I would need to reprogram the C(?) based PAML to allow for multi-processor or parallel computation support. I don't think this is going to happen anytime soon.

I managed by reducing the size of the datasets and submitting the jobs to the queue system of our university cluster. That way jobs were automatically loaded once resources were free, although the computational analysis of the individual jobs remained slow.

ADD COMMENT
0
Entering edit mode

With parallelpython is possible to parallelize any command line task. In the sense of automatically splitting the jobs, avoiding to make several datasets. I already did that with alignemnt programmes (mafft, prank...). So It's not truly a parallel PAML what I seek, but to be able to run several instances of PAML in different processors. Hope it is clearer now.

ADD REPLY

Login before adding your answer.

Traffic: 1903 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6