Entering edit mode
7.8 years ago
sacha
★
2.4k
Hi,
I downloaded gg_13_5_otus_99_annotated.tree.gz from http://greengenes.secondgenome.com/downloads/database/13_5 which contains a newick tree of 16S RNA taxonomy. I would like to extract a simple phylogeny tree, for example the relation between g__Straphylococcus, g__Streptococcus and g__Enterococcus . The problem is for each species, there are too many leaves labeled with number, which probably correspond to sequences IDs.
# This command print all nodes except leaves. I get all my taxonomy
nw_labels -L gg_13_5_otus_99_annotated.tree
# Output :
s__sedula
g__Metallosphaera
g__Acidianus
g__Stygiolobus
s__metallicus
g__Sulfolobus
g__Sulfolobus
......
# This command print only leaves. I get a list of number
nw_labels -I gg_13_5_otus_99_annotated.tree
# Output :
550922
1113159
569299
1106705
1104518
556057
3119364
So I want to remove all leaves, to be able to get trees without sequence node.
Any idea using newick tools or something else ?