Hi,
I'm just trying to work out quickly on paper the calculations behind the robinson foulds distance algorithm (not the symmetric distance) but rather the definition below from dendropy:
"This method returns the Robinsons-Foulds distance between two trees, i.e., the sum of the square of differences in branch lengths for equivalent splits between two trees, with the branch length for a missing split taken to be 0.0."
Assuming I had the two trees:
s1 = "((t5:0.161175,t6:0.161175):0.392293,((t4:0.104381,(t2:0.075411,t1:0.075411):0.028969):0.065840,t3:0.170221):0.383247)"
s2 = "((t5:2.161175,t6:0.161175):0.392293,((t4:0.104381,(t2:0.075411,t1:0.075411):1):0.065840,t3:0.170221):0.383247)"
So the clusters are:
(t1,t2)
(t1, t2, t3, t4)
(t1, t2, t3, t4, t5, t6)
(t1, t2, t4)
(t5, t6)
The branch lengths are the same in both trees with the exception of the 2.161175 (tree 2) and 0.161175 in tree1 (occurring twice). I already know from using the dendropy package that the correct answer is 2.971031.
However from the above input data and the two input trees, I would have thought that all equivalent splits end up producing a value of zero and 2.161175 - 0.161175 squared (occurring twice) would give a value 8.
I'm clearly incorrect but am not sure what I'm misunderstanding (maybe its my understanding of 'equivalent' splits). A worked example would be really helpful so any help would be appreciated.
Many thanks in advance.
Originally posted as an answer by amod47463; deleted and pasted here as a comment
Hi Joseph,
Thanks for the reply. Just to clarify: Do the workings above refer to the "euclidean distance" which is defined on the dendropy website (http://pythonhosted.org/DendroPy/tutorial/treestats.html) as:
Would you say this is a correct definition of the above calculation (it seems to be)? In the dendropy example however they give an answer of 2.2232636377544162 for euclidean distance.
On the other hand they define the Robinson Foulds algorithm as:
I would take this (based on the above) to be:
Any further clarification greatly appreciated.
The value of 2.971031 corresponds to absolute differences. The value of 2.223264 (i.e. sqrt(4.94290120)) corresponds to the euclidean distance as explained by David Winter. It would be great if you got in touch with the Dendropy developers to ask them to make the relevant clarifications.