Sequence Evolution Library In Java
4
0
Entering edit mode
13.8 years ago
Rob 6.9k

Hello,

I'm writing a biological evolution simulator. Currently, all of my code is written in Python. For the most part, this is great and everything works sufficiently well. However, there are two steps in the process which take a long time and which I'd like to rewrite in Scala.

The first problem area is sequence evolution. Imagine you're given a phylogenetic tree which relates a large set of proteins. The length of each branch represents the evolutionary distance between the parent and child. The root of the tree is seeded with a single sequence, and then an evolutionary model (e.g. http://en.wikipedia.org/wiki/Models_of_DNA_evolution) is used to evolve the sequence along the tree structure; taking into account the branch lengths. PyCogent takes a long time to perform this step, and I believe that a reasonable Java/Scala implementation would be significantly faster. Do you know of any libraries that implement this type of functionality. I want to write the application in Scala, so, due to interoperability, any Java library will suffice.

The second problem area is the comparison of the generated sequences. The problem is, given a set of sequences for the proteins in a number of different extant species, attempt to use the sequence to reconstruct the phylogenetic tree which relates the species. This problem is inherently computationally demanding, because one must basically do a pairwise comparison between all sequences in the extant species. Here again, however, I feel like a Java/Scala implementation would perform significantly faster than a Python one, if for nothing else than the unfortunately slow speed of looping in Python. This part I could write from scratch more easily than the sequence evolution part, but I'd be willing to use a library for it as well if a good one exists.

Thanks, Rob

java python evolution model • 4.3k views
ADD COMMENT
0
Entering edit mode

For your second problem area, I wouldn't recommend trying to re-implement phylogenetic inference in java. It is so computationally intensive that generally this is done in C. I only know of one tool that does part of this in java, and that's treefinder (http://www.treefinder.de/).

ADD REPLY
0
Entering edit mode

I have never heard that loops are slow in python. Aren't you confusing it with R?

ADD REPLY
7
Entering edit mode
13.8 years ago
Botond Sipos ★ 1.7k

Check out the Phylogenetic Analysis Library.

ADD COMMENT
0
Entering edit mode

Excellent! This library doesn't look like it has been updated in a while, but it seems to contain all of the functionality I need. I'll probably give it a try.

ADD REPLY
0
Entering edit mode

My friend is using pal. He is very satisfied with that.

ADD REPLY
3
Entering edit mode
13.8 years ago
Rvosa ▴ 580

Check out JEBL: http://sourceforge.net/projects/jebl/, which can roughly be seen as the successor to PAL, at least to the extent that some of the same people are involved (who seem to have abandoned PAL).

ADD COMMENT
2
Entering edit mode
13.8 years ago

I wouldn't be surprised to see improvements if the slowest elements of Pycogent were moved to c - seems like that group already has people like Daniel McDonald who know how to wrap c, since it looks like some of it is already in c:

find . | grep "\.c$"
./cogent/align/_compare.c
./cogent/align/_pairwise_pogs.c
./cogent/align/_pairwise_seqs.c
./cogent/evolve/_likelihood_tree.c
./cogent/maths/_matrix_exponentiation.c
./cogent/maths/_period.c
./cogent/maths/eigen.c
./cogent/maths/matrix_invert.c
./cogent/maths/spatial/ckd3.c
./cogent/struct/_asa.c
./cogent/struct/_contact.c
ADD COMMENT
0
Entering edit mode

Yea, some parts of what I'm doing are clearly already in C, but others aren't quite there yet. I'll be sure to keep an eye out for library updates.

ADD REPLY
0
Entering edit mode
12.8 years ago
Audriusa ▴ 10

There is an interesting Java - based package designed to model evolution of all kinds (JGap). It provides the good framework (chromosomes, selection, inheritance, evolution) and is normally used as a tool to solve various difficult problems by the method of simulated evolution. JGap also offers some interesting visualization tools. It is old, mature project with hundreds of downloads per week. I think I can surely recommend it.

ADD COMMENT

Login before adding your answer.

Traffic: 1873 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6