Question

Choosing phylogenetic approaches

1

Entering edit mode

10.2 years ago

Yongjie Zhang ▴ 110

Hello all,

I have constructed phylogenetic trees for my data set using three different approaches, namely MP, ML, and BI. When I submited the paper, one reviewer commented our phylogenetic approaches below.

I still do not really understand why MP is used to reconstruct phylogenetic trees, because BI and ML are used as well. It is of course a matter of choice, but if MP is used with molecular data, molecular evolution should be implemented as well as possible in the analysis, e.g. via adjusted Ti/Tv, step matrices, ... ideally not using ML to define the values used, else just use ML to make the tree if the analysis is feasible, as it is the case here

In my analysis with MP, I had gaps treated as the fifth base, so it's not allowed to set Ti/Tv step matrices because of conflicts of the two settings. Can you tell what phylogenetic approaches you generally use? Also, can you suggest how to respond this comment? Thanks.

Yongjie

phylogeny • 4.2k views

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.2 years ago by Yongjie Zhang ▴ 110

4

Entering edit mode

Interesting precedent........ crowdsourcing reviewer responses !

ADD REPLY • link 10.2 years ago by cdsouthan ★ 1.9k

1

Entering edit mode

MP is a relatively outdated method. I guess the reviewer meant that you should stick to ML and BI only, as they are apparently statistically more robust. NJ is old too, but if you do prefer NJ, Clearcut is the state-of-art program to do NJ in 2014. Nowadays computing power is no longer a big issue for ML (if still for BI). FastTree or ExaML can build tree upon tons of sequences in a short time. If there are some sites that may be saturated, you should remove them from the dataset.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by qiyunzhu ▴ 130

0

Entering edit mode

This is a bit of a late comment, but while BI and ML are great methods under the right circumstances, I do feel that these methods have more issues than many are willing to admit or take the time to examine. Both require the making of assumptions. Assumptions about substitution models, assumptions about rate heterogeneity and how to deal with it, assumptions that a single model will be appropriate along all edges of the tree etc. While MP is indeed an older method, in this age of abundant sequence data if one can identify a subset of informative sites in an alignment that are likely to have undergone only a single character state change (there are ways of doing this), MP may actually be very effective. This enables one to avoid making too many assumptions and this can be a good thing. Let us not throw out the baby with the bathwater on parsimony-based analysis.

ADD REPLY • link 9.9 years ago by confusedious ▴ 490

0

Entering edit mode

First of all, how diverged are your sequences?

If they are all very closely related and you aren't too worried about sites having changed more than once then MP is fine - though I'd still add confidence scores of some kind after bootstrapping. I like MP for closely related taxa because the assumptions made in stating explicit substitution models are avoided.

If your sequences represent taxa that are millions of years diverged then I wouldn't bother with MP and would stick with a method that can deal with the potential of double- and back-mutation and the like, id est ML or Bayesian tree building (or even NJ building - this often gets overlooked but it performs comparably to ML a lot of the time and is A LOT faster to run). With these, of course, you will need to be sure that the substitution model you have used is suitable too, so maybe use ModelTest if you haven't already.

ADD REPLY • link 10.2 years ago by confusedious ▴ 490

0

Entering edit mode

Thanks for your reply.

I have two data sets. One is the alignment from different isolates of a species; the other is the alignment from closely-related species. The former has 8.3% of variable sites; the latter has 23.1% of variable sites. Do you think if my using the three approaches (MP, BI, ML) is appropriate?

Yongjie

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Yongjie Zhang ▴ 110

0

Entering edit mode

To me, the first data set with the lower number of variable sites seems appropriate for MP - providing you are relatively happy to live with the fact that MP will not successfully deal with sites that are more rapidly evolving than others.

The second data set seems inappropriate for MP to me unless you are going to remove sites that you think are rapidly evolving. I wouldn't bother with it to be honest - I'd just stick to methods that incorporate a substitution model (again, need to make sure you use something appropriate for your data and the only way to do that well is to estimate which model is appropriate from the data itself). I would use NJ, ML or BI.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by confusedious ▴ 490

0

Entering edit mode

What program do you prefer to use when building a NJ tree? As I know, some programs only allow limited numbers of models although I have never used NJ.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Yongjie Zhang ▴ 110

0

Entering edit mode

I use Geneious for NJ trees - mostly because the alignment viewer in this program is probably the best on the market. It gives you the option of several substitution models including HKY, TN and JC. If you decide to use NJ I should mention that publications seem to be biased against their use (much as they are with MP). I really wish they weren't as NJ often performs comparably with ML and is much faster to run.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by confusedious ▴ 490

1

Entering edit mode

I would disagree - distance based approaches, often corrected under the appropriate model of sequence evolution, are often the only feasible way to deal with high-throughput datasets with hundreds/thousands of taxa and thousdands/millions of bp.

RAxML and FastTree, however, are basically ubiquitous and are approximate-likelihood approaches.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Brice Sarver ★ 3.8k

0

Entering edit mode

As I said, I personally like NJ methods. Quick, and work quite well in most situations. Give it a try and see how it works for your needs.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by confusedious ▴ 490