Entering edit mode
7.8 years ago
spaul8505
▴
20
I have phylogenetic trees for each gene family, so totally around 2000 trees. Now, I would like to see if all of those trees are similar or if there is incongruence by dividing them into different categories. I tried using grep to do this and did grab all possible occurrences/patterns of the tree but I am sure there are other ways to do it, possible using clustering or other methods? Has anyone done it before?
For instance if I use grep
((X, Y), Z), ((Y, X), Z), (Z, (X, Y)) and (Z, (Y, X))
These four trees are grabbed as four different pattern whereas they are all equivalent and should be considered as one kind of tree.
Categories based upon what?
I like the
ete3-toolkit
for comparing trees personally, if that's of any help. It'll compute the Robinson-Foulds metric of congruency between all your trees. It should (I think) score your 4 examples as equivalently congruent since they all take the same number of topological transformations. I'm not sure if you want more 'depth' than that though..Based on the topology, I would like to see out of the 2000 trees, what are the major patterns that are observed. For instance, if there are 4 such toplogies, then I would like to see how many times they are observed out of 2000. And, since these 4 types are equivalent, I want a summed up number of the trees that fall into this category.
So you already know what topologies you're expecting, if you're searching for them manually?
No, I did something like this
......
I am looking for a software to do this. I checked out Phylip's consense, although it outputs the number of times a partition occurs, it doe snot specify which trees they are. I have rooted trees.
You could use what jrj.healey suggested. If you compute RF distances for all tree pairs, you would be able to make groups of trees that have RF=0 (same topology)