Is there a script somewhere around (MATLAB, R, PERL) that calculates a distance matrix based on a tree file? Hence, I already have the tree but want all the distance information in a matrix.
Is there a script somewhere around (MATLAB, R, PERL) that calculates a distance matrix based on a tree file? Hence, I already have the tree but want all the distance information in a matrix.
If you are interested in an another solution (in R) try this
library(ape)
tree<-read.tree("/file.tre")
PatristicDistMatrix<-cophenetic(tree)
This is an old reply, but I found this and it misled me to spend time understanding the ape::cophenetic.phylo()
function.
I have an integer distance tree and it returned nonsense (as far as I can tell). The function this answer alludes is stats::cophenetic() and it works wonderfully with phylo objects.
I validated the results with the output from the T-Rex server.
There is a Newick->distance matrix converter in the T-rex package, available as a web-server and as source code. There are no details as far as I can see describing what this method actually does.
To calculate the patristic distance (i.e. the distance between two leaves measured along the tree) from a variety of tree formats between all pairs of leaves, you can try the PATRISTIC Java application.
It also looks like the DendroPy package can generate patristic distance from a variety of tree formats. E.g. from the Treestats tutorial: http://packages.python.org/DendroPy/tutorial/treestats.html
#! /usr/bin/env python
import dendropy
from dendropy import treecalc
tree = dendropy.Tree.get_from_path("input.nex", "nexus")
pdm = treecalc.PatristicDistanceMatrix(tree)
for i, t1 in enumerate(tree.taxon_set):
for t2 in tree.taxon_set[i+1:]:
print("Distance between '%s' and '%s': %s" % (t1.label, t2.label, pdm(t1, t2)))
Casey!
Great I ran the webserver with some simple trees and it runs OK. It has some problems with nodes but that is likely a matter of not including the nodes in the tree file. Great! No I will try with the source (prefer to pipe it) and see what the others do! Thanks really a great help.
Arjen
PS Didn't you use to work on polygalacturonases?
Normally, trees are generated from a distance matrix and you certainly lose information while constructing the tree. What you could get from a tree would be the sum of branch lengths for each pair of nodes. This might be possible in R by loading the tree with ape, then converting to igraph with as.igraph
and finally computing the shortest paths.
Thanks for this solution, seems the way except that I never worked with R. I managed to install R as well as ape and igraph, opened the tree using ape but then I cannot figure out how to use as.igraph The usage is as.igraph(object) but I do not have a clue how I should refer to my object.
I managed a bit further. I got it converted to igraph but now igraph does not do what I want, probably due to my misunderstanding of how to use the commands. If I put
shortest.paths(graph, v=132, weights = NULL)
I get the same as when I put
shortest.paths(graph, v=132)
which is basically the amount of edges between nodes
Hence weight is not included properly
To be sure: I transferred "mytree" from ape to igraph as "graph", it has 133 edges, hence 0-132. Then by putting weights = NULL
it should include the weights per edge BUT some edges have a value of 0.0, could this be the problem?
I haven't used igraph very much, but perhaps you can write your graph e.g. by write.graph(graph, "graph.ncol", format="ncol")
and see how it looks? http://igraph.sourceforge.net/igraphbook/igraphbook-foreign.html
I wrote a code in Perl myself. It works for me.
I used the example tree from http://en.wikipedia.org/wiki/Newick_format
The output of the following code is:
A B C D
A 0 0.3 0.9 1
B 0.3 0 1 1.1
C 0.9 1 0 0.7
D 1 1.1 0.7 0
#!/usr/bin/perl -w
use strict;
my $tree = "(A:0.1,B:0.2,(C:0.3,D:0.4):0.5);";
##record the distance of parentheses
my %dis;
my $par = -1;
my @current;
while($tree =~ /./g)
{if ($& eq '(')
{$par ++;
next if $par == 0;
$current[$#current+1] = $par;
}
elsif($& eq ')')
{(my $tem) = $' =~ /:(\d+\.\d+|\d+)/;
next if $#current == -1;
$dis{'node_'.$current[$#current]} = $tem;
pop @current;
}
}
##record the distance of leaves
my @order;
while ($tree =~ /([^\(\):,]+):(\d+\.\d+|\d+)/g)
{$dis{$1} = $2;
$order[$#order+1] = $1;
}
##record parents of leaves
my %pare;
@current = ();
$par = -1;
while($tree =~ /(\(|\)|([^\(\):,]+):)/g)
{if ($& eq '(')
{$par ++;
next if $par == 0;
$current[$#current+1] = $par;
}
elsif($& eq ')')
{pop @current;
}
else{map {$pare{$2}{$_} = 1} @current;
$pare{$2} = [@current];
}
}
##Distance matrix
my %dis2;
foreach my $i (0..$#order)
{foreach my $j ($i..$#order)
{if ($i == $j)
{$dis2{$order[$i]}{$order[$j]} = 0;
}
else{my $tem = $dis{$order[$i]} + $dis{$order[$j]};
my $tem2 = -1;
foreach my $k (0..$#{$pare{$order[$i]}})
{last if ($k > $#{$pare{$order[$j]}});
if ($pare{$order[$i]}[$k] eq $pare{$order[$j]}[$k])
{$tem2 = $k;
}
}
if ($#{$pare{$order[$i]}} != -1)
{map {$tem += $dis{'node_'.$_}} map {$pare{$order[$i]}[$_]} ($tem2+1)..$#{$pare{$order[$i]}};
}
if ($#{$pare{$order[$j]}} != -1)
{map {$tem += $dis{'node_'.$_}} map {$pare{$order[$j]}[$_]} ($tem2+1)..$#{$pare{$order[$j]}};
}
$dis2{$order[$i]}{$order[$j]} = $dis2{$order[$j]}{$order[$i]} = $tem;
}
}
}
##output
print join("\t",'',@order),"\n";
foreach my $i (@order)
{print join("\t",$i,map {$dis2{$i}{$_}} @order),"\n";
}
close(OUT);
You can try BioPerl modules: http://bioperl.org/wiki/HOWTO:Trees
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
So, the matrix would show distances between all leafs of the tree?
The matrix would show distances between all leafs of the tree?
Indeed, probably it would also calculate the distance between all internal nodes but that might even turn out handy