I have a question about the use of ML in phylogenentics. As we know, likelihood of the phylogenetic tree that is inferred based on ML might be very low, for example 0.0001, though it could be significantly higher than other trees. So if this is true, why do people seem to believe ML phylogenetics more than other predictions which may have higher accuracy? Thanks!
I'm not quite getting the "as we know" and "why do people seem to believe" parts of this, because I don't necessarily agree with your statement. Can you give us more background on how you are coming to state these things?
Accuracy in this context is the chance of arriving to a tree that is best explained by the data. Phylogenetic trees are hypotheses about what is likely to have happened in the past, so the idea of accuracy is hazy since you don't have a time machine to sample DNA from the past.
In my opinion, it is wrong "to believe", or "to believe more" in ML, or another approach. In phylogenetics different methods are for different tasks, and personally me always compare ML and MP (parsimony), ML and BI (bayesian), distance-based and not distance-based methods. And it's also useful to remember, that trees, just represent how our "data" in "our model" or "our priors" should look like under "our method".
Someone please correct me if I'm wrong.
I agree all of the comments. I think there are a lot people tend to down play distance-based methods as character evolution information is lost when distances are calculated. But with more data, which is becoming available now, I think in the end, we could capture as much phylogenetic signal from distances, too.
As others have suggested, you have to keep the definitions of likelihood, probability and accuracy clear here.
In statistics likelihood refers to a particular quantity - the probability of observing a dataset given a statistical model (and it's parameters): p(data|model+params). In phylogeny our data is an alignment, and ,if we are using likelihood-based methods, our model contains a tree relating individuals to each other and a substituion model (with its parameters). So the likelihood is p(alignment | tree + subs.model + params). The "maximum" bit of Maximum Likelihood refers to the fact these methods search for the tree (and model parameters) which maxamise the likelihood (you can think of this as finding the model that best fits the data).
Note that this is a different than asking "what's the probability that this is The True Tree" (something a frequentist would say we can't even ask, let alone answer). It's the probability of getting the particular alignment we have, if the tree (and substitution model) being considered are true. For a large dataset any particular alignment is improbable (just as any particular deal of cards or run of coin tosses is improbable) so likelihoods associated with phylogenies are usually very low.
More generally, you have to be very careful with terms like "accuracy" in phylogeny. We usually can't measure this (a few experiments on virueses notwithstanding), and whatever critera we might have to compare trees are actually statements about how well our data and our models fit. You can have strong support for the wrong tree if your models are bad, or if you data is biased in some way.
I'm not quite getting the "as we know" and "why do people seem to believe" parts of this, because I don't necessarily agree with your statement. Can you give us more background on how you are coming to state these things?
Accuracy in this context is the chance of arriving to a tree that is best explained by the data. Phylogenetic trees are hypotheses about what is likely to have happened in the past, so the idea of accuracy is hazy since you don't have a time machine to sample DNA from the past.
In my opinion, it is wrong "to believe", or "to believe more" in ML, or another approach. In phylogenetics different methods are for different tasks, and personally me always compare ML and MP (parsimony), ML and BI (bayesian), distance-based and not distance-based methods. And it's also useful to remember, that trees, just represent how our "data" in "our model" or "our priors" should look like under "our method". Someone please correct me if I'm wrong.
I agree all of the comments. I think there are a lot people tend to down play distance-based methods as character evolution information is lost when distances are calculated. But with more data, which is becoming available now, I think in the end, we could capture as much phylogenetic signal from distances, too.