Question

Subdivide phylogenetic trees

0

Entering edit mode

8.6 years ago

spaul8505 ▴ 20

I have phylogenetic trees for each gene family, so totally around 2000 trees. Now, I would like to see if all of those trees are similar or if there is incongruence by dividing them into different categories. I tried using grep to do this and did grab all possible occurrences/patterns of the tree but I am sure there are other ways to do it, possible using clustering or other methods? Has anyone done it before?

For instance if I use grep

            ((X, Y), Z), ((Y, X), Z), (Z, (X, Y)) and (Z, (Y, X))

These four trees are grabbed as four different pattern whereas they are all equivalent and should be considered as one kind of tree.

RAxML clustering • 2.2k views

ADD COMMENT • link 8.6 years ago by spaul8505 ▴ 20

1

Entering edit mode

Categories based upon what?

I like the ete3-toolkit for comparing trees personally, if that's of any help. It'll compute the Robinson-Foulds metric of congruency between all your trees. It should (I think) score your 4 examples as equivalently congruent since they all take the same number of topological transformations. I'm not sure if you want more 'depth' than that though..

ADD REPLY • link 8.6 years ago by Joe 22k

0

Entering edit mode

Based on the topology, I would like to see out of the 2000 trees, what are the major patterns that are observed. For instance, if there are 4 such toplogies, then I would like to see how many times they are observed out of 2000. And, since these 4 types are equivalent, I want a summed up number of the trees that fall into this category.

        ((X, Y), Z)
        ((Y, X), Z)
        (Z, (X, Y)) 
        (Z, (Y, X))

ADD REPLY • link 8.6 years ago by spaul8505 ▴ 20

0

Entering edit mode

So you already know what topologies you're expecting, if you're searching for them manually?

ADD REPLY • link 8.6 years ago by Joe 22k

0

Entering edit mode

No, I did something like this

 for f in file{1..2000}; 
 do printf "%s\t" $f; tr -d '[ 0-9.]' <$f; 
 done | 
 sort -k2

 file1   (((A:,B:):,C:):,(D:,((E:,F:):,G:):):,H:):;
 file2   ((A:,(B:,C:):):,(((F:,E:):,G:):,D:):,H:):;

......

I am looking for a software to do this. I checked out Phylip's consense, although it outputs the number of times a partition occurs, it doe snot specify which trees they are. I have rooted trees.

ADD REPLY • link 8.6 years ago by spaul8505 ▴ 20

0

Entering edit mode

You could use what jrj.healey suggested. If you compute RF distances for all tree pairs, you would be able to make groups of trees that have RF=0 (same topology)

ADD REPLY • link 8.6 years ago by abascalfederico ★ 1.2k