Question

How too choose outgroup for rooting a phylogenetic tree?

1

Entering edit mode

9.7 years ago

dago ★ 2.8k

I would like to have your opinion about an issue I am dealing with quite often during my work.

Many times I create phylogenetic trees for proteins I am working with. For some of them, the phylogenetic relationship with other proteins has been well studied. Therefore, it is quite easy to go back to the literature and choose protein sequences to use as outgroup.

In other cases there are no studies available concerning the phylogenetic relationship of the proteins. How would you choose then outgroup sequences to root the tree?

I often analyze the domain composition of the protein and then I pick as outgroup proteins that have similar catalytic domains, but that are involved in other biological functions/processes.

Thanks very much for sharing your idea.

phylogeny sequence alignment • 16k views

ADD COMMENT • link updated 13 months ago by andre.arrudalima ▴ 60 • written 9.7 years ago by dago ★ 2.8k

0

Entering edit mode

What I would do is look at the unrooted tree. This will allow you to see which group is segregating further than the rest of your taxa, and you can pick it as an outgroup. Otherwise, just root the tree in the middle for convenience.

ADD REPLY • link 9.7 years ago by Adrian Pelin ★ 2.6k

0

Entering edit mode

Thanks for your suggestion. What do u think about the approach of looking protein containing similar domians,a s I describe in the post?

ADD REPLY • link 9.7 years ago by dago ★ 2.8k

0

Entering edit mode

I think it could work. But if the proteins are truly different, in the sense that they might have unique catalytic domains, you should see them have longer branch lengths, so you should spot them on the unrooted tree grouping together and diverging, so you can use them as an outgroup.

ADD REPLY • link 9.7 years ago by Adrian Pelin ★ 2.6k

0

Entering edit mode

Any other suggestion?

ADD REPLY • link 9.7 years ago by dago ★ 2.8k

Ram · Answer 1 · 2015-04-18

There appears to be a bit of a fundamental misunderstanding present in what you're doing, at least based on my interpretation of your question and responses. I also disagree with the discussion. Please correct me if I'm misinterpreting what you're saying.

The first step in building a tree, the sequence alignment, is an inference of homology. In other words, you are assuming that each site is truly homologous - has shared ancestry - across all individuals/samples. If you are using sequences that are not homologous, your tree is meaningless in an evolutionary context. An alignment should consist of DNA or amino acid sequences from the same protein across all your samples. You don't want to make a tree consisting of multiple proteins; this, too, is meaningless. If you are not confident with respect to the homology of your sequences and just want to build a tree describing similarity (a dendrogram), go ahead, but realize that it is not phylogenetic. You might also consider restricting analyses to the sites you can be confident are homologous, thus salvaging some of your data.

The choice of an outgroup - where to root the tree - is also not arbitrary. It's a hypothesis, and the choice of which individual you use has lots of implications for the conclusions you might make using the tree. Due to substitution rate heterogeneity and the influence of various evolutionary forces on a locus, the most distantly related individual or sample might not be an appropriate outgroup. In this sense, I disagree with Adrian in the comments, but I do agree with his recommendation to midpoint root in the absence of any other information. Remember, being a tree doesn't have to be rooted, and many popular analyses recommend/require an unrooted tree.

score 0 · Answer 2 · 2023-11-22

Hi,

I'm facing a similar issue. I'm working on constructing a phylogenetic tree for Arabidopsis protein orthologs, selecting one representative from each plant order. Additionally, I'm dividing the protein into intra- and extracellular domains to create two distinct phylogenies. The unrooted trees for the extracellular and intracellular domains show differences. I'm also aware of the well-established phylogenetic relationships between the species.

Given this context, how should I root my tree? Is it feasible to examine the alignment identity matrix and select the protein with the lowest percentage in each tree as an outgroup (even if this means having different outgroups between the phylogenies)? Alternatively, should I use a known species as the outgroup based on the established Viridiplantae phylogeny?

Thank you.