Hi, I am new to the PhyloXML format and have a question about parsing files to split gene trees at duplication nodes. I have 9000 files, each containing a gene tree representing the evolution of a gene family. I want to split each gene tree at each duplication node. My question is simply how do you isolate subtrees at a duplication node (event type duplication)? Apologies if this is obvious, I cannot find any guidance on this matter or any examples online.
Hi Etal, thanks that is exactly what I was looking for! Is it possible to then remove any subtrees that have less than for example 3 species from the main gene tree? Due to the number of gene trees I have I am looking to automate the process of throwing out subtrees that do not meet a certain threshold. I assume this can be done using another if statement, but can biopython pick out species using the tree methods you mentioned or would I have to define the species names? Thanks.
It depends how your tree is structured, but if each leaf of the tree represents a species, then you can count the terminal nodes under each clade, e.g. change the middle line to:
Otherwise if you need to do more checks on each terminal to extract the species name, it looks like: