Get distance matrix based on node tree
1
0
Entering edit mode
4.9 years ago
Chvatil ▴ 130

Hi, I need some help getting a phylogenetic distance matrix.

Here are two examples:

Example one:

tree=ete3.Tree('(((A,B),C),D);')
 print(tree)

         /-A
      /-|
   /-|   \-B
  |  |
--|   \-C
  |
   \-D

The matrix should then be :

    A   B   C   D
A   0   1   2   3
B   1   0   2   3
C   2   2   0   3
D   3   3   3   0

As you can see A and B are the closest leaves, then C is closer to A and B than it is to D and finally the furthest leaf is D.

Here is another more complex example 2:

tree=ete3.Tree('((((((A,B),C),D),(E,F)),G),(H,I));')
print(tree)

                  /-A
               /-|
            /-|   \-B
           |  |
         /-|   \-C
        |  |
      /-|   \-D
     |  |
     |  |   /-E
   /-|   \-|
  |  |      \-F
  |  |
--|   \-G
  |
  |   /-H
   \-|
      \-I

and here I should get the followgin matrix:

    A   B   C   D   E   F   G   H   I
A   0   1   2   3   4   4   5   6   6
B   1   0   2   3   4   4   5   6   6
C   2   2   0   3   4   4   5   6   6
D   3   3   3   0   4   4   5   6   6
E   4   4   4   4   0   1   5   6   6
F   4   4   4   4   1   0   5   6   6
G   5   5   5   5   5   5   0   6   6
H   6   6   6   6   6   6   6   0   1
I   6   6   6   6   6   6   6   1   0

I tried get_distance functions on ete3 but it does not give matrix based on node distance...

tree distance matrix phylogeny • 2.9k views
ADD COMMENT
1
Entering edit mode
4.9 years ago
Joe 22k

If you're happy to use dendropy instead of ete3 (both are very good), then it could be done as simply as:

import dendropy
tree = dendropy.Tree.get(path='path/to/tree.tree', schema='newick') # or whatever relevant format if not newick
pdm = tree.phylogenetic_distance_matrix()
pdm.to_csv('/path/to/output.csv')
ADD COMMENT
0
Entering edit mode

Thank you , but pdm.to_csv('/path/to/output.csv') gives AttributeError: 'PhylogeneticDistanceMatrix' object has no attribute 'to_csv' Anyway I tried :

for i, t1 in enumerate(tree.taxon_namespace[:-1]):
   for t2 in tree.taxon_namespace[i+1:]:
        print("Distance between '%s' and '%s': %s" % (t1.label, t2.label, pdm(t1, t2)))

But I get only distance of zero between leaves

Distance between 'A' and 'B': 0.0
Distance between 'A' and 'C': 0.0
Distance between 'A' and 'D': 0.0
Distance between 'A' and 'E': 0.0
Distance between 'A' and 'F': 0.0
Distance between 'A' and 'G': 0.0
Distance between 'A' and 'H': 0.0
Distance between 'A' and 'I': 0.0
Distance between 'B' and 'C': 0.0

...

ADD REPLY
0
Entering edit mode

Ah sorry I think its .write_csv(). Check the package documentation.

You are getting zero distances, because your tree is topological only - it has no branch lengths. You can artificially 'fudge' this by making a cladogram of your tree, and just set all the distances to 1. Effectively your nodes have no distance in the normal sense for a tree, just hierarchical relationships.

I'm not aware of any built in functionality myself to calculate this based just off the 'rank'/'cardinality' of the nodes. It would be doable in principle by calculating the pairwise node ranks etc, but thats far more work than just faking a cladogram and using the built in methods.

ADD REPLY

Login before adding your answer.

Traffic: 3207 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6