... or how to automatically detect "novelty" in phylogenetic trees?
Today, I have a question about one of my weak spots, phylogenetics, sorry if it is a bit vague.
Imagine we have a set (>10k) of automatically generated (unrooted) trees for groups of homologous genes for a newly sequenced organism. Assume the tree generation is more or less sane (which is possibly not true for all trees ;) ).
- Which could be criteria to screen for genes with "interesting" characteristics in their trees for further study? Looking for indicators e.g. of horizontal gene-transfer, co-evolution, convergent evolution, or differences in selective pressure, etc.
- Which could be good criteria for filtering out crap trees (which I am sure exist, for example those looking like a comb, all branches have equal length, possibly because of errors in orthologue selection).
- Which algorithm or software could be used to accomplish the above?
My ideas:
- Define a consensus evolutionary tree (either from literatures or set of selected genes or all genes) and calculate tree difference (e.g. Robinson-Foulds metric). Possibly normalize by number of branches?
- Use taxonomic information, e.g. to detect "out-of-clade" trees, where "my organism" is situated closest to members of a different clade than it has been classified in.
I searched for +"novelty detection" phylogenetic tree
and similar terms but I didn't spot an article where this was applied automatically.