Hi,
I have a newick tree.
'(((61082:1,(764031:1,((386100:1,908211:1)1:1,(764033:1,(252962:1,121494:1)1:1)1:1)1:1)1:1)1:1,((1041945:1,(908214:1,252963:1)1:1)1:1,(121492:1,((450361:1,764034:1)1:1,(908212:1,(908213:1,908215:1)1:1)1:1)1:1)1:1)1:1)1:1,(((479641:1,((1313225:1,479639:1)1:1,467775:1)1:1)1:1,(((289401:1,289398:1)1:1,((253172:1,936147:1)1:1,479643:1)1:1)1:1,(((479640:1,153946:1)1:1,(281489:1,364019:1)1:1)1:1,((((400682:1,178514:1)1:1,((178539:1,178552:1)1:1,((6052:1,681720:1)1:1,(882799:1,(289074:1,394683:1)1:1)1:1)1:1)1:1)1:1,(458493:1,(283497:1,344322:1)1:1)1:1)1:1,333317:1)1:1)1:1)1:1)1:1,((479638:1,(55567:1,233783:1)1:1)1:1,(458489:1,36754:1)1:1)1:1)1:1);'
I want to get for each leaf node, a list of closely related leaf nodes using ete2 python.
how can I do that?
could you explain what do you mean by "closely related"? close leaves by branch distance, topology...
I want that in the sense of : how much should be the topology distance to say it is closely related or no (using :
tree.get_distance(node1,node2,topology_only=True)
in ete2 package in python?)This will get you the number of branches that separate two nodes. The cut-off for "closely related" is up to your, and it will depend on many factors. In general, I would say that branch length is a better proxy than topological distance (so, turn off the
topology_only
flag). This question is somehow related: Which cut-off for collapsing this tree?I think in my tree, the branch lengths are always equal to 1..
Your tree seems based on NCBI taxnomy ids. A good strategy would be to group closely related leaves based on their rank in the taxonomy database (i.e. same genus/family).
I wrote some scripts to query the NCBI taxonomy tree that may be of your interest: https://github.com/jhcepas/ncbi_taxonomy
Thank you.
Yes my Tree is a bifurcated version of the NCBI tree with leaf names are the taxonomy ids (Only the Metazoan tree)
Do you think using :
python ./ncbi_query.py -t 9913 9031 9606 -x
will help me get what i want?