What format are GTDB-Tk newick trees and how to extract specific node?
0
1
Entering edit mode
3.8 years ago
O.rka ▴ 740

I'm trying to read my tree from the classify/ directory from GTDB-Tk but I'm getting an error:

In [50]: tree = ete3.Tree("./classify/gtdbtk.bac120.classify.tree", format=0, quoted_node_names=True)
---------------------------------------------------------------------------
NewickError                               Traceback (most recent call last)
<ipython-input-50-913e9702b09b> in <module>
----> 1 tree = ete3.Tree("./classify/gtdbtk.bac120.classify.tree", format=0, quoted_node_names=True)

/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/soothsayer_py3.8_env/lib/python3.8/site-packages/ete3/coretype/tree.py in __init__(self, newick, format, dist, support, name, quoted_node_names)
    208         if newick is not None:
    209             self._dist = 0.0
--> 210             read_newick(newick, root_node = self, format=format,
    211                         quoted_names=quoted_node_names)
    212

/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/soothsayer_py3.8_env/lib/python3.8/site-packages/ete3/parser/newick.py in read_newick(newick, root_node, format, quoted_names)
    249             raise NewickError('Unexisting tree file or Malformed newick tree structure.')
    250         else:
--> 251             return _read_newick_from_string(nw, root_node, matcher, format, quoted_names)
    252
    253     else:

/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/soothsayer_py3.8_env/lib/python3.8/site-packages/ete3/parser/newick.py in _read_newick_from_string(nw, root_node, matcher, formatcode, quoted_names)
    324                     closing_internal =  closing_internal.rstrip(";")
    325                     # read internal node data and go up one level
--> 326                     _read_node_data(closing_internal, current_parent, "internal", matcher, formatcode)
    327                     current_parent = current_parent.up
    328

/usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/soothsayer_py3.8_env/lib/python3.8/site-packages/ete3/parser/newick.py in _read_node_data(subnw, current_node, node_type, matcher, formatcode)
    428             _parse_extra_features(node, data[2])
    429     else:
--> 430         raise NewickError("Unexpected newick format '%s' " %subnw[0:50])
    431     return
    432

NewickError: Unexpected newick format 'ete3_quotref_1:0.09438'
You may want to check other newick loading flags like 'format' or 'quoted_node_names'.

I was able to successfully read using format=1 but I'm getting nodes with the name 1.0 which I don't think is correct:

In [49]: p_node.get_common_ancestor(["MAG_883.8", "MAG_883.14"])
Out[49]: Tree node '1.0' (0x7f810794d82)
metagenomics gtdb database phylogeny • 1.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 1856 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6