I work on molecular phylogenetic inference using maximum likelihood methods, but from the computer science side of things, as it is an interesting problem area for heuristic search. However, I would like to know a bit more about how biologists actually use inferred trees and would appreciate it if someone would care to answer some questions I have after having worked with this the last months:
- If one infers a phylogenetic trees for a set of sequences, is the resulting tree considered “probably correct”? Only a hypothesis?
- Are techniques such as boot-strapping used to help guide this belief? Other techniques?
- How many sequences is it common to infer a phylogeny from? I ask this because I know the problem complexity grows very quickly as more species are added (hence the need for heuristic search).
Bootstrapping is not a panacea for quantifying uncertainty. Say that you use a stupid inference procedure that always infers ((a,b),(c,d)) regardless of the data. This would have 100% bootstrap support. Other less stupid inference procedures can give similarly misleading bootstrap percentages when interpreted as the posterior probability that the branch exists.