Say that I have found the following phylogenetic tree for four species – a, b, c, and d, and this tree has a high likelihood:
/\
t₁/ \ t₂
/ \
a /\
t₃/ \ t₄ [tree 1]
/ \
/\ d
t₅/ \ t₆
/ \
b c
If I want to “extract information” about the tree of only the two species a and b, from the above tree – to the degree that this is possible,
/\
t₁/ \ ? [tree 2]
/ \
a b
what should be my guess for the branch length marked with a question mark in the second tree?
I am going to use this “subtree” as the starting point for further heuristic search, so I want a good guess to reduce the search time.
After “pruning” c and d from tree 1, there are several options for the branch length between root and b in tree 2:
- it could be set to t₂+t₃+t₅
- b could be moved up to the t₃/t₄ branch, making it t₂
- it could be set to the average value of t₂, t₃, and t₅.
- it could be set to the branch length connecting b to its parenemphasized textt in the first tree, i.e. t₅.
Does any of these options make more sense than others? (Is there an obvious answer?) Is there any theory on this I could look up?
[My initial thought was that t₂+t₃+t₅ is the best estimate since this conserves the time between the root and b which – assuming tree 1 is a good one – makes the states observed in b most likely.]
Thanks! That's really interesting, I will definitely look into the node density effect. :-)