calculating confidence intervals of boostrapped trees
2
2
Entering edit mode
5.0 years ago
Moses ▴ 150

Hi All,

I have a pylogenetic tree for over 3,000 prokaryotic genomes. I have made many replicates of this tree using jacknife approach. So now I have many variants of this tree. Now I want to calculate the confidence over the branches of this tree by checking how much of my clades stay monophyletic in these generated trees.

Is there a standardized software that people use that takes in many trees as input (using some tree format i.e. newick) and then returns the percentage of conserved monophyletic groups using the majority rule?

I'm aware that there are many variables in my question, first of all how would I define a clade? if my species were taxonomically annotated than I can define clades at different levels, i.e. genus, families, order etc.... However let's assume that I do not have this annotation information, and want to report some numbers as in how much does the tree change or how much does it stay conserved? how much of it's clades in the original tree that are monophyletic stay monophyletic in all the boot straps?

I have no prior experience in this and am not sure in what directions to think about doing such a thing. Any advice would be appreciated. Thank you.

phylogenetics trees bootstrapping • 1.7k views
ADD COMMENT
0
Entering edit mode

What software did you use to create the jackknife-ed trees? PAUP* can perform jackknife replicates and provide a consensus tree. See page 77 of the manual http://www.phylo.org/sub_sections/PAUP_Cmd_ref_v2.pdf

and a tut here: http://ib.berkeley.edu/courses/ib200a/labs/ib200a_lab10_bootstrap_jackknife_bremer.pdf

ADD REPLY
0
Entering edit mode

Thank you for your reply Amar, I made my own jackknifed trees because I am proposing my own way of constructing a phylogenetic tree, so I have allready a list of tree variations (I am not using the traditional sequence concatination and then MSA then constructing a Tree, that's why)

ADD REPLY
0
Entering edit mode

so I was looking into PAUP, I'm not sure if this is going to work, it seems like it takes sequence alignment as input and then starts removing alignment columns etc. I have the tree allready and will take it from there, like I said I can also make my jackknifed trees, so the problem is more like given a set of jackknifed trees is there a program that will give me statistics about certain monophyletic clades and their relative frequencies in all the smaller variations of trees?

ADD REPLY
2
Entering edit mode
5.0 years ago
Joseph Hughes ★ 3.0k

In PAUP, you can summarise your jacknife trees either on the majority rule tree or on a constraint tree of your choice that you provide to it. You can also use SumTrees, part of the Dendropy python package or the APE package in R to do the same thing.

You can also do something similar if you have a range of trees from a Bayesian inference using TreeAnnotator (part of BEAST).

ADD COMMENT
1
Entering edit mode

Upvoting, plus some additional thoughts.

You have a tree, and you're interested in support values from a distribution of trees you've already generated (regardless of approach). You want to make a consensus tree (perhaps majority-rule to account for any uncertainty in your point estimate?), and you can get nodal support values (frequentist or Bayesian) from these trees using any of the approaches outlined in this answer. I don't know how you bootstrapped or jackknifed without an alignment - how did you model your sequence evolution?

You have a few other questions and pieces of information:

I am not using the traditional sequence concatination and then MSA then constructing a Tree, that's why.

Concatenation assumes a single evolutionary history and model of sequence evolution applies across your alignment. There are established approaches for accounting for discordance among loci when inferring trees that have been developed over the past decade or so. You may consider searching the literature for "how to estimate species trees."

I have a pylogenetic tree for over 3,000 prokaryotic genomes. I have made many replicates of this tree using jacknife approach. So now I have many variants of this tree

Estimating trees from datasets this size can be hard, especially if you don't have the alignment. RAxML can handle large datasets and also perform bootstrapping. But, how are you even arriving at these sets of trees? What software? Are you dropping tips from your source tree and looking at, say, triplets or quartets? I don't understand and can't really provide guidance until I do.

I'm aware that there are many variables in my question, first of all how would I define a clade? ...

A clade is, by definition, a monophyletic group. Delimiting prokaryotic species is beyond the scope of this phylogenetic question. You may consider calculating a variety of tree distances (Robinson-Foulds, Kuhner-Felsenstein) to calculate discordance among sets of trees. Perhaps start here.

ADD REPLY
1
Entering edit mode
5.0 years ago
Moses ▴ 150

I was just looking for a simple solution moving from a bag of trees to confidence intervals using the majority rule for the frequency of appearance of the monophyletic clades across the bootstrapped trees or jackknifed trees. I could have implemented my own script to do this however I tried to use some well established software instead of re-inventing the wheel.

The simple solution turns out to be the consesne function from PHYLIP tools does exactly what I have been asking for, you give the program a bag of trees, one file having all your trees in newick format, one tree per line and then it will give you the consensus tree, consensus clades (monophyletic, the frequency as they appear etc).

and it ran amazingly fast (less than a minute).

I will treat consesne function from PHYLIP as a solution to my question for future reference and for those of you who encounter similar questions or have to report some confidence scores.

ADD COMMENT
0
Entering edit mode

Fantastic, glad to hear you found an answer and thanks for coming back to post a solution!

ADD REPLY
0
Entering edit mode

absolutely! thank you for your guidance!

ADD REPLY

Login before adding your answer.

Traffic: 2684 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6