Script To Calculate A Distance Matrix Based On Tree File
5
3
Entering edit mode
13.8 years ago

Is there a script somewhere around (MATLAB, R, PERL) that calculates a distance matrix based on a tree file? Hence, I already have the tree but want all the distance information in a matrix.

distance matrix phylogenetics tree • 25k views
ADD COMMENT
0
Entering edit mode

So, the matrix would show distances between all leafs of the tree?

ADD REPLY
0
Entering edit mode

The matrix would show distances between all leafs of the tree?

ADD REPLY
0
Entering edit mode

Indeed, probably it would also calculate the distance between all internal nodes but that might even turn out handy

ADD REPLY
11
Entering edit mode
13.7 years ago
Raquel Dsl ▴ 110

If you are interested in an another solution (in R) try this

library(ape)
tree<-read.tree("/file.tre")
PatristicDistMatrix<-cophenetic(tree)
ADD COMMENT
2
Entering edit mode

I think the last line should be PatristicDistMatrix<-cophenetic.phylo(tree)

ADD REPLY
0
Entering edit mode

tree should be a phylo object, Do you know any function which can take an object of clade.matrix?

ADD REPLY
0
Entering edit mode

This is an old reply, but I found this and it misled me to spend time understanding the ape::cophenetic.phylo() function.

I have an integer distance tree and it returned nonsense (as far as I can tell). The function this answer alludes is stats::cophenetic() and it works wonderfully with phylo objects.

I validated the results with the output from the T-Rex server.

ADD REPLY
0
Entering edit mode

Indeed, thanks a lot, this seems applicable in scripts.

ADD REPLY
4
Entering edit mode
13.7 years ago

There is a Newick->distance matrix converter in the T-rex package, available as a web-server and as source code. There are no details as far as I can see describing what this method actually does.

To calculate the patristic distance (i.e. the distance between two leaves measured along the tree) from a variety of tree formats between all pairs of leaves, you can try the PATRISTIC Java application.

It also looks like the DendroPy package can generate patristic distance from a variety of tree formats. E.g. from the Treestats tutorial: http://packages.python.org/DendroPy/tutorial/treestats.html

#! /usr/bin/env python

import dendropy
from dendropy import treecalc

tree = dendropy.Tree.get_from_path("input.nex", "nexus")
pdm = treecalc.PatristicDistanceMatrix(tree)
for i, t1 in enumerate(tree.taxon_set):
    for t2 in tree.taxon_set[i+1:]:
        print("Distance between '%s' and '%s': %s" % (t1.label, t2.label, pdm(t1, t2)))
ADD COMMENT
0
Entering edit mode

Casey!

Great I ran the webserver with some simple trees and it runs OK. It has some problems with nodes but that is likely a matter of not including the nodes in the tree file. Great! No I will try with the source (prefer to pipe it) and see what the others do! Thanks really a great help.

Arjen

PS Didn't you use to work on polygalacturonases?

ADD REPLY
0
Entering edit mode

@Arjen - I think Raquel DSL may have the scriptable solution you are looking for above.

ADD REPLY
2
Entering edit mode
13.8 years ago

Normally, trees are generated from a distance matrix and you certainly lose information while constructing the tree. What you could get from a tree would be the sum of branch lengths for each pair of nodes. This might be possible in R by loading the tree with ape, then converting to igraph with as.igraph and finally computing the shortest paths.

ADD COMMENT
0
Entering edit mode

Thanks for this solution, seems the way except that I never worked with R. I managed to install R as well as ape and igraph, opened the tree using ape but then I cannot figure out how to use as.igraph The usage is as.igraph(object) but I do not have a clue how I should refer to my object.

ADD REPLY
0
Entering edit mode

I managed a bit further. I got it converted to igraph but now igraph does not do what I want, probably due to my misunderstanding of how to use the commands. If I put

shortest.paths(graph, v=132, weights = NULL)

I get the same as when I put

shortest.paths(graph, v=132)

which is basically the amount of edges between nodes

Hence weight is not included properly

To be sure: I transferred "mytree" from ape to igraph as "graph", it has 133 edges, hence 0-132. Then by putting weights = NULL it should include the weights per edge BUT some edges have a value of 0.0, could this be the problem?

ADD REPLY
0
Entering edit mode

I haven't used igraph very much, but perhaps you can write your graph e.g. by write.graph(graph, "graph.ncol", format="ncol") and see how it looks? http://igraph.sourceforge.net/igraphbook/igraphbook-foreign.html

ADD REPLY
0
Entering edit mode

Indeed, no weights appear to be present....

ADD REPLY
2
Entering edit mode
9.6 years ago
guanhomer ▴ 20

I wrote a code in Perl myself. It works for me.

I used the example tree from http://en.wikipedia.org/wiki/Newick_format

The output of the following code is:

       A    B      C    D
A    0    0.3    0.9    1
B    0.3    0    1    1.1
C    0.9    1    0    0.7
D    1    1.1    0.7    0
#!/usr/bin/perl -w
use strict;

my $tree = "(A:0.1,B:0.2,(C:0.3,D:0.4):0.5);";

##record the distance of parentheses
my %dis;
my $par = -1;
my @current;
while($tree =~ /./g)
    {if ($& eq '(')
        {$par ++;
        next if $par == 0;
        $current[$#current+1] = $par;
        }
    elsif($& eq ')')
        {(my $tem) = $' =~ /:(\d+\.\d+|\d+)/;
        next if $#current == -1;
        $dis{'node_'.$current[$#current]} = $tem;
        pop @current;
        }
    }

##record the distance of leaves
my @order;
while ($tree =~ /([^\(\):,]+):(\d+\.\d+|\d+)/g)
    {$dis{$1} = $2;
    $order[$#order+1] = $1;
    }

##record parents of leaves
my %pare;
@current = ();
$par = -1;
while($tree =~ /(\(|\)|([^\(\):,]+):)/g)
    {if ($& eq '(')
        {$par ++;
        next if $par == 0;
        $current[$#current+1] = $par;
        }
    elsif($& eq ')')
        {pop @current;
        }
    else{map {$pare{$2}{$_} = 1} @current;
        $pare{$2} = [@current];
        }
    }

##Distance matrix
my %dis2;
foreach my $i (0..$#order)
    {foreach my $j ($i..$#order)
        {if ($i == $j)
            {$dis2{$order[$i]}{$order[$j]} = 0;
            }
        else{my $tem = $dis{$order[$i]} + $dis{$order[$j]};
            my $tem2 = -1;
            foreach my $k (0..$#{$pare{$order[$i]}})
                {last if ($k > $#{$pare{$order[$j]}});
                if ($pare{$order[$i]}[$k] eq $pare{$order[$j]}[$k])
                    {$tem2 = $k;
                    }
                }
            if ($#{$pare{$order[$i]}} != -1)
                {map {$tem += $dis{'node_'.$_}} map {$pare{$order[$i]}[$_]} ($tem2+1)..$#{$pare{$order[$i]}};
                }
            if ($#{$pare{$order[$j]}} != -1)
                {map {$tem += $dis{'node_'.$_}} map {$pare{$order[$j]}[$_]} ($tem2+1)..$#{$pare{$order[$j]}};
                }
            $dis2{$order[$i]}{$order[$j]} = $dis2{$order[$j]}{$order[$i]} = $tem;
            }
        }
    }

##output
print join("\t",'',@order),"\n";
foreach my $i (@order)
    {print join("\t",$i,map {$dis2{$i}{$_}} @order),"\n";
    }
close(OUT);
ADD COMMENT
0
Entering edit mode

This was amazingly fast, and my jaw dropped

ADD REPLY
0
Entering edit mode

This is brilliant! Thank you!!

ADD REPLY
1
Entering edit mode
13.7 years ago
Zhidkov ▴ 600

You can try BioPerl modules: http://bioperl.org/wiki/HOWTO:Trees

ADD COMMENT

Login before adding your answer.

Traffic: 2590 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6