Parallel Phylogenetic Tree Generation
4
5
Entering edit mode
14.5 years ago
Hanif Khalak ★ 1.3k

I am trying some large-scale viral phylogenies with 1000s of gene DNA sequences, each almost 2Kb in length, using a parallel version of ClustalW coded for an SMP machine.

I don't have access to a large cluster, but on a 16-core machine I'm using I found that most of the processing time is not actually the pairwise alignment - it's in the tree building, where only one CPU is being used.

One of the runs with ~10K sequences failed to complete even after a couple of months - had to reboot, but only because of some power test. Go Linux!

Any suggestions as to alternatives that accelerate the tree generation?

phylogenetics parallel tree clustalw • 5.3k views
ADD COMMENT
10
Entering edit mode
14.5 years ago
Paulo Nuin ★ 3.7k

First things first. Don't use ClustalW for tree generation, it's an alignment program and the Neighbour Joining algorithm there is not as good as some other available. Second, 1000s of sequences even with NJ approach will take a long time. Just calculate all possibilities of arrangements, so there's no magic bullet here.

You have, AFAIK, two options:

Use RAxML, which is a very nice application and known to be fast, more here

Use MrBayes compiled in MPI mode, which will also take some time.

Of course you can try downloading a NJ parallel package, checking Google a couple came up, but I don't know how fast or reliable they are.

ADD COMMENT
1
Entering edit mode

agree, clustalW is not a good choice for trees

ADD REPLY
1
Entering edit mode

RAxML has become pretty much the gold standard for ML phylogenetics reconstruction. A reasonable alternative, and much faster, is FastTree. There is also RAxML-Light, a stripped down version of RAxML optimized for extremely large taxonomic sets.

For your alignments there are also much better options out there than Clustal. Muscle is one option. Don't recall offhand if Mafft does nucleotides or not.

ADD REPLY
0
Entering edit mode

RAxML looks interesting - I'll have to give it a go on a small set and see how it fares; will probably give better trees as well. Thanks!

ADD REPLY
5
Entering edit mode
14.5 years ago

Do you already have an alignment?

  1. remove redundancy (through fast clustering e.g. uclust)
  2. use a fast algorithm (NJ over ML/MP/Bayes)
  3. use a fast memory efficient implementation:

I think <10000 you should be fine.

ADD COMMENT
2
Entering edit mode

NJ and MP are horrible ideas for doing trees today. There are incredibly fast implementations of full ML out there that can do thousands to tens of thousands of taxa. RAxML itself is reasonably fast on large datasets but FastTree and RAxML-Light are both optimized for extremely large bacterial and viral datasets and environmental studies.

Removing redundancy is a good idea but depending on your question and data you might only want to do it at the 100% identity level.

ADD REPLY
0
Entering edit mode

Wow! Both Ninja and fasttree claim at least 10x speedup over other similar NJ and ML methods, respectively. Definitely going to try them out - Thanks!

ADD REPLY
0
Entering edit mode

@bubaker: It would be great if you could report back what you found, both with these and for Paulo's suggestions!

ADD REPLY
1
Entering edit mode
14.5 years ago
Elipapa ▴ 90

MAFFT and MUSCLE are fast aligners. For speedy (and accurate) tree building few things are better than FastTree in my opinion.

ADD COMMENT
0
Entering edit mode
14.5 years ago
Biomed 5.0k

If you don't want to change the software approach but looking for faster computing I suggest you look at Amazon cloud services.

ADD COMMENT
0
Entering edit mode

I would never suggest that, why would that change anything? He already has access to a 16-core machine, and he will just waste money.

ADD REPLY

Login before adding your answer.

Traffic: 1667 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6