Inserting Sequences In A Phylogeny
4
6
Entering edit mode
14.0 years ago

Hi all,

I was wondering whether the following problem has been considered in the literature. Let T be a typical, binary, leaf-labelled phylogenetic tree, with leaves x1, x2, ..., xn, and suppose I receive a new sequence y that is supposed to be related to the sequences I already have. Can one determine whether y can be inserted in that tree, either as an ancestor of my available sequences or as a new leaf, and if multiple choices are possible, what is (are) the most likely option(s)?

algorithm phylogenetics • 6.1k views
ADD COMMENT
9
Entering edit mode
14.0 years ago
Dave Lunt ★ 2.0k

I agree with those above that sometimes re-evaluating the tree gives better results, but it might not be the optimum solution with lots of things to add to a big tree. Have a look at pplacer. It describes itself like this

"Pplacer places query sequences on a fixed reference phylogenetic tree to maximize phylogenetic likelihood or posterior probability according to a reference alignment. Pplacer is designed to be fast, to give useful information about uncertainty, and to offer advanced visualization and downstream analysis"

ADD COMMENT
4
Entering edit mode

Just for completeness, I would also suggest MLTreeMap, which seems to be very similar to pplacer.

ADD REPLY
0
Entering edit mode

+1 Sound great!

ADD REPLY
0
Entering edit mode

Great, thanks a lot!

ADD REPLY
7
Entering edit mode
14.0 years ago
lh3 33k

There are times when we want to keep the topology. You can:

  1. Add the new sequence to the existing multialignment with muscle, or any aligner that supports profile alignment.

  2. Build the tree with "treebest nj -c old_tree.nh new_alignment.fasta". The topology of the old tree is always the same as the input. This is "hard" constraining.

  3. Alternatively, you can build the tree with "treebest phyml -C old_tree.nh new_alignment.fasta". However in this case, the topology of the old tree might be changed if the alignment strongly agree with an alternative topology. This is "soft" constraining.

ADD COMMENT
1
Entering edit mode

"nj" does constrained neighbour joining. It is described in a PhD thesis. "phyml" is modified from an old version of Phyml program. It penalizes bisections that disagree with the input topology. You cannot find the detailed description of that algorithm.

ADD REPLY
0
Entering edit mode

Thanks for this information, I'll try and find out what algorithms they use.

ADD REPLY
3
Entering edit mode
14.0 years ago

I do not know what the literature says regarding this scenario, but I know from practice - doing this myself and talking to other scientists who have either written phylogenetic analysis software or who have done this type of analysis - that adding a sequence or taxa to a tree is best done by re-evaluating all relationships. In other words, you should recalculate the tree because the new sequence can alter many relationships between pairs of "old" sequences as well as have many relationships to both those "old" sequences and the ancestral sequences at each node.

ADD COMMENT
1
Entering edit mode
13.5 years ago

PAGAN will accept new sequences (reads) to be added to an existing alignment (ref_alignment) and associated tree (ref_tree). The tree can be labeled in NHX format for a subset of nodes to try, or use the slower --exhaustive option.

Example:

pagan --ref-seqfile ref_alignment_file --ref-treefile ref_tree_file --readsfile reads_file
ADD COMMENT

Login before adding your answer.

Traffic: 2739 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6