Hello all,
I have constructed phylogenetic trees for my data set using three different approaches, namely MP, ML, and BI. When I submited the paper, one reviewer commented our phylogenetic approaches below.
I still do not really understand why MP is used to reconstruct phylogenetic trees, because BI and ML are used as well. It is of course a matter of choice, but if MP is used with molecular data, molecular evolution should be implemented as well as possible in the analysis, e.g. via adjusted Ti/Tv, step matrices, ... ideally not using ML to define the values used, else just use ML to make the tree if the analysis is feasible, as it is the case here
In my analysis with MP, I had gaps treated as the fifth base, so it's not allowed to set Ti/Tv step matrices because of conflicts of the two settings. Can you tell what phylogenetic approaches you generally use? Also, can you suggest how to respond this comment? Thanks.
Yongjie
Interesting precedent........ crowdsourcing reviewer responses !
MP is a relatively outdated method. I guess the reviewer meant that you should stick to ML and BI only, as they are apparently statistically more robust. NJ is old too, but if you do prefer NJ, Clearcut is the state-of-art program to do NJ in 2014. Nowadays computing power is no longer a big issue for ML (if still for BI). FastTree or ExaML can build tree upon tons of sequences in a short time. If there are some sites that may be saturated, you should remove them from the dataset.
This is a bit of a late comment, but while BI and ML are great methods under the right circumstances, I do feel that these methods have more issues than many are willing to admit or take the time to examine. Both require the making of assumptions. Assumptions about substitution models, assumptions about rate heterogeneity and how to deal with it, assumptions that a single model will be appropriate along all edges of the tree etc. While MP is indeed an older method, in this age of abundant sequence data if one can identify a subset of informative sites in an alignment that are likely to have undergone only a single character state change (there are ways of doing this), MP may actually be very effective. This enables one to avoid making too many assumptions and this can be a good thing. Let us not throw out the baby with the bathwater on parsimony-based analysis.
First of all, how diverged are your sequences?
If they are all very closely related and you aren't too worried about sites having changed more than once then MP is fine - though I'd still add confidence scores of some kind after bootstrapping. I like MP for closely related taxa because the assumptions made in stating explicit substitution models are avoided.
If your sequences represent taxa that are millions of years diverged then I wouldn't bother with MP and would stick with a method that can deal with the potential of double- and back-mutation and the like, id est ML or Bayesian tree building (or even NJ building - this often gets overlooked but it performs comparably to ML a lot of the time and is A LOT faster to run). With these, of course, you will need to be sure that the substitution model you have used is suitable too, so maybe use ModelTest if you haven't already.
Thanks for your reply.
I have two data sets. One is the alignment from different isolates of a species; the other is the alignment from closely-related species. The former has 8.3% of variable sites; the latter has 23.1% of variable sites. Do you think if my using the three approaches (MP, BI, ML) is appropriate?
Yongjie
To me, the first data set with the lower number of variable sites seems appropriate for MP - providing you are relatively happy to live with the fact that MP will not successfully deal with sites that are more rapidly evolving than others.
The second data set seems inappropriate for MP to me unless you are going to remove sites that you think are rapidly evolving. I wouldn't bother with it to be honest - I'd just stick to methods that incorporate a substitution model (again, need to make sure you use something appropriate for your data and the only way to do that well is to estimate which model is appropriate from the data itself). I would use NJ, ML or BI.
What program do you prefer to use when building a NJ tree? As I know, some programs only allow limited numbers of models although I have never used NJ.
I use Geneious for NJ trees - mostly because the alignment viewer in this program is probably the best on the market. It gives you the option of several substitution models including HKY, TN and JC. If you decide to use NJ I should mention that publications seem to be biased against their use (much as they are with MP). I really wish they weren't as NJ often performs comparably with ML and is much faster to run.
I would disagree - distance based approaches, often corrected under the appropriate model of sequence evolution, are often the only feasible way to deal with high-throughput datasets with hundreds/thousands of taxa and thousdands/millions of bp.
RAxML and FastTree, however, are basically ubiquitous and are approximate-likelihood approaches.
As I said, I personally like NJ methods. Quick, and work quite well in most situations. Give it a try and see how it works for your needs.