Question

Automated methods for comparing trees

0

Entering edit mode

9 months ago

Alex Reynolds 36k

I have a couple trees derived from separate agglomerative hierarchical clustering runs on two datasets, which yields trees A and B (see figure).

I pick a node from tree A, at some depth. I take the signal associated with the leaves of that node and aggregate it (take the mean at each position, say). This gives me a vector of signal for that node.

I repeat for tree B, getting another signal vector specific to leaves from the node off of tree B.

I run some distance function over those two signal vectors at get a score.

My question is: Are there algorithms for doing this in an automated way, which optimize for the distance score?

In my sketch below, for instance, I have nodes from trees A and B that are very different in constitution. Tree B's node has many more members than tree A's node. But their aggregate signal could be very similar based on Euclidean or other distance metric.

I want some rigorous way of identifying those "best-matching" nodes, regardless of differences between their leaf content.

The combinatorics of node selection might make testing prohibitive. So I thought going to the same normalized tree depth would be helpful, as a start.

Before I try reinventing the wheel, are there approaches for doing this which are "rigorous", "efficient", or are there other aspects I am overlooking? Thanks!

Tree sketch

distance agglomerative compare tree • 425 views

ADD COMMENT • link updated 9 months ago by Mensur Dlakic ★ 29k • written 9 months ago by Alex Reynolds 36k

score 1 · Answer 1 · 2024-06-10

Ete3 has a tree compare function:

http://etetoolkit.org/

This may also be of interest:

https://github.com/rrnewton/PhyBin

PhyloDM has some functions for branch distance calculations:

https://github.com/aaronmussig/PhyloDM

If you calculate all-vs-all distances within a tree, reducing those matrices to 2D might give you an easy way to highlight the differences.