Measuring the difference or similarity between two phylogenies
2
1
Entering edit mode
8.5 years ago
confusedious ▴ 490

Hello everyone,

I'd like to take two phylogenies for a given nucleotide alignment and create a measure of the similarity between the two alignments. Branch length is not of interest to me in this case, I only care about the branching order or edges shared.

I would like to do this in R, and ideally the method should be able to handle non-bifurcating trees. Further, all trees I am working with are unrooted.

I know how to calculate the symmetric difference using Phangorn in R, but it can only use bifurcating trees. Further, I'd like to obtain a result that is perhaps out of 1 (where symmetric distance is reported as a count of partitions not shared by the trees, I'd like a result that is perhaps a portion of edges or path shared out of 1), so that it is comparable easily across different trees for different alignments.

Does anyone have any suggestions? The tree files I have are in Newick format.

R phylogenetics • 4.3k views
ADD COMMENT
0
Entering edit mode

what is actually diffrence between NJ tree and UPGMA tree. secondly, i want to construct NJ tree from genetic distance values. Please suggest me some online softwares, with their input data format, because i am facing many problems in data entry

ADD REPLY
0
Entering edit mode

Please do not ask the same question in multiple threads. Specially do not use the "Submit Answer" section for asking a new question in existing threads.

You should start a new thread/post for new questions.

ADD REPLY
3
Entering edit mode
8.4 years ago
jhc ★ 3.0k

not an R package, but this command line tool may help you: http://etetoolkit.org/documentation/ete-compare/. Most important features:

  • Robinson-Foulds symmetric difference in rooted and unrooted trees
  • Percentage of edge similarity (number of branches in one tree that are present in another)
  • Distance can be calculated between trees containing duplicated attributes
  • matches and mismatches can be dumped
  • compare trees of different sizes
  • can ignore lowly supported branches
ADD COMMENT
0
Entering edit mode

This looks promising - I'll give it a try. Thank you.

ADD REPLY
1
Entering edit mode
8.4 years ago

Look at the ape R package. It has functions to compare phylogenetic trees. You could also simply represent each tree by a distance matrix and use the Mantel test (also in the ape package).

ADD COMMENT
0
Entering edit mode

Thank you for the suggestion - the difficulty that arises with this method for me is that I'm comparing a tree that was made from a complete nucleotide alignment to a tree that was made using a subset of informative positions in the alignment. As such, the distance between the taxa is modified.

The part that matters for me is whether the same partitions in the tree remain - I'm just trying to find a way to represent this numerically.

ADD REPLY
1
Entering edit mode

You could use a graph distance unrelated to genetics e.g. shortest path or commute time. If you have multiple comparisons to do, another approach could be to measure similarity between the graphs directly using a graph kernel. These are global approaches and won't identify parts that are conserved or not. If you want to recover the conserved partitions themselves, you could look into algorithms for subtree isomorphism, such as this one.

ADD REPLY
0
Entering edit mode

This is very useful - thank you very much.

I'll look into these and let you know how I go.

ADD REPLY

Login before adding your answer.

Traffic: 2305 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6