I would not rewrite the newick parser from scratch unless I had a very good reason. Have you tried any of tree manipulation libraries available (bio::phylo, ETE, etc.)?
Here it goes a very easy implementation using Python and ETE. You can customize the output as you wish.
from ete2 import Tree
def children_names(n):
return ','.join([ch.name for ch in n.children])
nw = "((ar, tk) 23 , (abc, ((xa, xb) 26 , ((lima, limb) 28 , (((spi, sfi) 31 , (eqc, (kme, ame) 33 ) 32 ) 30 , ((fxf, (tva, tvb) 36 ) 35 , (fxe, (cjk, ((mrb, (qq, rr) 41 ) 40 , sgn) 39 ) 38 ) 37 ) 34 ) 29 ) 27 ) 25 ) 24 ) 22 ;"
tree = Tree(nw, format=1) # newick subformat 1 to read internal node names
for node in tree.traverse("postorder"):
# A verbose output
if not node.is_leaf() and node.up: # For all internal nodes
print node.name, "whose parent is", node.up.name, \
"has", len(node.children), "children named", children_names(node), \
"and groups the following leaves:", ",".join(node.get_leaf_names())
elif not node.is_leaf(): # In case we are at the root of the tree
print node.name, "who is the root of the tree", \
"has", len(node.children), "children named", children_names(node), \
"and groups the following leaves:", ",".join(node.get_leaf_names())
I know this is not the most efficient way to browse the content of internal branches but, if your trees are not huge, few lines of code are more than enough.
23 whose parent is 22 has 2 children named ar,tk and groups the following leaves: ar,tk
26 whose parent is 25 has 2 children named xa,xb and groups the following leaves: xa,xb
28 whose parent is 27 has 2 children named lima,limb and groups the following leaves: lima,limb
31 whose parent is 30 has 2 children named spi,sfi and groups the following leaves: spi,sfi
33 whose parent is 32 has 2 children named kme,ame and groups the following leaves: kme,ame
32 whose parent is 30 has 2 children named eqc,33 and groups the following leaves: eqc,kme,ame
30 whose parent is 29 has 2 children named 31,32 and groups the following leaves: spi,sfi,eqc,kme,ame
36 whose parent is 35 has 2 children named tva,tvb and groups the following leaves: tva,tvb
35 whose parent is 34 has 2 children named fxf,36 and groups the following leaves: fxf,tva,tvb
41 whose parent is 40 has 2 children named qq,rr and groups the following leaves: qq,rr
40 whose parent is 39 has 2 children named mrb,41 and groups the following leaves: mrb,qq,rr
39 whose parent is 38 has 2 children named 40,sgn and groups the following leaves: sgn,mrb,qq,rr
38 whose parent is 37 has 2 children named cjk,39 and groups the following leaves: cjk,sgn,mrb,qq,rr
37 whose parent is 34 has 2 children named fxe,38 and groups the following leaves: fxe,cjk,sgn,mrb,qq,rr
34 whose parent is 29 has 2 children named 35,37 and groups the following leaves: fxf,fxe,tva,tvb,cjk,sgn,mrb,qq,rr
29 whose parent is 27 has 2 children named 30,34 and groups the following leaves: spi,sfi,eqc,fxf,fxe,kme,ame,tva,tvb,cjk,sgn,mrb,qq,rr
27 whose parent is 25 has 2 children named 28,29 and groups the following leaves: lima,limb,spi,sfi,eqc,fxf,fxe,kme,ame,tva,tvb,cjk,sgn,mrb,qq,rr
25 whose parent is 24 has 2 children named 26,27 and groups the following leaves: xa,xb,lima,limb,spi,sfi,eqc,fxf,fxe,kme,ame,tva,tvb,cjk,sgn,mrb,qq,rr
24 whose parent is 22 has 2 children named abc,25 and groups the following leaves: abc,xa,xb,lima,limb,spi,sfi,eqc,fxf,fxe,kme,ame,tva,tvb,cjk,sgn,mrb,qq,rr
22 who is the root of the tree has 2 children named 23,24 and groups the following leaves: ar,tk,abc,xa,xb,lima,limb,spi,sfi,eqc,fxf,fxe,kme,ame,tva,tvb,cjk,sgn,mrb,qq,rr
Hey, thanks. Do you know a script compatible with BioPerl modules that would do what I want?
The [?]Bioperl Phylogenetics and Analysis Scrapbook[?] has a code snippet for returning all Clades in a tree that should be useful for you.
The Bioperl Phylogenetics and Analysis Scrapbook has a code snippet here: http://www.bioperl.org/wiki/Finding_all_clades_represented_in_a_tree that should be useful. It returns all Clades in a tree. This will only show the leaf taxa that form all possible clades but could easily be modified to show the names of internal nodes as well if desired.