Question

How To Create A Phylogram In Phylip With Bootstrap Values On It?

3

Entering edit mode

14.1 years ago

Panos ★ 1.8k

This is related to one of my previous questions.

By phylogram I mean a phylogenetic tree where branch length is proportional to the divergence in the nucleotide level (correct me if I'm wrong).

I had the impression that such a tree would be created by using the phylip programs "seqboot->dnadist->neighbor->consense" but apparently (look at the bottom of page 9) this is not the case.

One option would be to first build a phylogram using "neighbor", then do bootstrap and using Photoshop, for example, add the bootstrap values to the phylogram. The problem that I see using this approach is that there is the possibility for the topology of the phylogram to be different compared to that of the cladogram containing the bootstrap values.

• 14k views

ADD COMMENT • link updated 12.1 years ago by doudabensasson ▴ 10 • written 14.1 years ago by Panos ★ 1.8k

1

Entering edit mode

Hey Panos, you've really answered your own question. It's fine to add support values with illustrator/inkscape but the topologies may differ (which is why consense doesn't give branch lengths)

ADD REPLY • link 14.1 years ago by David W 4.9k

0

Entering edit mode

You can also plot node support for the consensus in TreeView or APE. I HATE doing it by hand!

ADD REPLY • link 12.1 years ago by Zev.Kronenberg 12k

score 2 · Answer 1 · 2012-03-07

Here is an R function that can merge the bootstrap values from the consensus tree with the actual branch lengths from another tree. It uses the ape package, which also includes some tree plotting functions.

The comments in the code have most of the details. I usually write my comments in roxygen format - I hope it isn't too difficult to read. It makes it easier to wrap the functions into a package later.

AFAIK, this function works correctly, but I have not tested it extensively, so you should probably manually check the output. I would be interested in hearing about any bugs you find.

#' @param bootTree   The \code{phylo} object created from the 
#'                   consensus tree (with bootstrap values in place
#'                   of branch lengths). This tree is usually rooted.
#' @param edge       If TRUE (the default), the function returns a vector
#'                   of bootstrap values corresonding to the edges of 
#'                   \code{branchTree}, otherwise a vector of bootstrap
#'                   values corresponding to the internal nodes. For plotting,
#'                   it is best to use edge=TRUE.
#' @return If edges is TRUE, a \code{character} vector with the bootstrap 
#' value for each edge in \code{branchTree}. The value can be used as
#' an argument to \code{edgelabels}. Edges leading to leaves have blank
#' values. If edges is FALSE, a vector of bootstrap values for every internal 
#' node in the tree (in order by the numbers given to the nodes). This can
#' be used as an argument to nodelabels in a plot or added as the \code{node.labels}
#' componenent of a \code{phylo} object. In this case, the bootstrap values
#' correspond to the parent branch of the node.
#'
#' @details The function is meant to take a \code{phylo} object created from
#' an unrooted tree with true branch lengths and a consensus tree in which the
#' bootstrap values are encoded by branch lengths. Typically the true branch length
#' tree from Phylip will be unrooted (although it may appear rooted in a plot) and the
#' consensus tree will be rooted (although it is meant to be interpreted as unrooted).
#' Bootstrap values are really properties of edges - they represent the support
#' for a particular split in the tree.
#'
#' This function works by first making a code for every internal node in the
#' tree that indicates the split represented by the branch leading to that node
#' from its parent. Using these codes, a matrix of matching nodes in the two
#' trees is generated. Then the function loops over the edges in 
#' \code{branchTree}, taking the bootstrap value from the corresponding edge 
#' in \code{bootTree} and skipping the branches leading to tips.
#'
#' One must be careful when changing the root or the rooted status of the
#' branchTree. The bootstrap values should be re-merged with the branchTree after
#' any manipulation that changes the topology. Caution is especially 
#' important when writing the values to a file: the values can be written as
#' internal node labels, but they will not be correct if the tree is re-rooted
#' or if the tree is plotted as an unrooted tree (even though it actually is
#' unrooted). The function node2Edge converts bootstraps written as internal
#' node labels into edge labels. For plotting, it is best to use edge=TRUE. 
#'
#' The split codes are generated by making a matrix with rows for taxa (at the tips)
#' and columns for the internal nodes in the tree. The taxa are in 
#' alphabetical order, so that any trees with the same set of taxa can be
#' compared. Initially an entry in the matrix is TRUE if the internal node 
#' contains the taxon as a child. The columns are adjusted so that the
#' minimum number of entries are TRUE (because the split AB | CDE is the 
#' same as CDE | AB). Then each column is converted to hexadecimal code
#' that is easy to compare.
#'
#' @seealso \code{\link{phylo}} for an explanation of the tree data structure used here
mergeTrees <- function(branchTree, bootTree, edge=TRUE) {
    require(ape)
    .splitCode <- function(tree) {
        pp           <- prop.part(tree)
        tipLabels    <- attr(pp, "labels")
        ordTipLabels <- order(tipLabels)
        nTip         <- Ntip(tree)
        # Each column of out is an internal node; Each
        # row is tip
        out <- sapply(pp, function(x, ntip) {
                             ret <- numeric(ntip)
                             ret[ordTipLabels[x]] <- 1
                             as.logical(ret)
                          }, nTip) 
        out <- apply(out, 2, function(x) {
                                 s <- sum(x)
                                 l <- length(x) / 2
                                 if(s > l)
                                     !x
                                 else if(s == l && x[1])
                                     !x
                                 else
                                     x
                              })
        nFillerBits <- ceiling(nrow(out) / 8) * 8 - nrow(out)
        out <- rbind(out, matrix(nrow=nFillerBits, ncol=ncol(out), data=FALSE))
        out <- apply(out, 2, function(x) paste(packBits(x), collapse=""))
        attr(out, "labels") <- tipLabels[ordTipLabels]
        cbind(1:length(pp) + Ntip(tree), out)
    }

    brSplits <- .splitCode(branchTree)
    boSplits <- .splitCode(bootTree)
    brSplits <- brSplits[brSplits[, 2] %in% boSplits[, 2], ]
    boSplits <- boSplits[boSplits[, 2] %in% brSplits[, 2], ]

    nodeNums <- cbind(brSplits[match(boSplits[, 2], brSplits[, 2]), 1],
                      boSplits[, 1])
    storage.mode(nodeNums) <- "numeric"

    if(edge)
        bootstrap <- character(length(branchTree$edge.length))
    else
        bootstrap <- character(Nnode(branchTree))

    for(i in seq_along(bootstrap)) {
        if(edge)
            brNode  <- branchTree$edge[i, 2]
        else
            brNode <- i + Ntip(branchTree)
        boNode <- nodeNums[nodeNums[, 1] == brNode, 2]
        if(!length(boNode))
            next
        bootVal <- as.character(bootTree$edge.length[bootTree$edge[, 2] == boNode])
        if(length(bootVal))
            bootstrap[i] <- bootVal
    }
    bootstrap
}

And here are some examples of how to use it:

# Load ape and read in some data:
# Note that read.tree reads Newick format files.
library(ape)
br <- read.tree("/path/to/tree/with/branch_lengths")
bo <- read.tree("/path/to/consensus/tree/with/bootstraps")

# Now merge trees and make a plot
# Remember, if you want to root br, do that before running mergeTrees
bootstraps <- mergeTrees(br, bo, TRUE)
plot(br)
edgelabels(bootstraps, frame="none", adj=c(0.5, 0))

# An unrooted plot
plot(br, type="unrooted")
edgelabels(bootstraps, frame="none")

# Write the bootstrap values to a file as branch labels
bootstraps <- mergeTrees(br, bo, FALSE)
br$node.names <- bootstraps
write.tree(br, file="mytree.phy")

# Now read the tree back in and convert the node labels to
# edge labels for plotting. Note that if you root the tree,
# you will have to use mergeTrees again
mytree <- read.tree("mytree.phy")
bootstraps <- node2Edge(mytree)
plot(mytree); edgelabels(bootstraps)

#' Convert node labels to edge labels
#'
#' Makes a vector of edge labels from internal node names. Turns
#' the output of mergeTrees with edge=FALSE into the output 
#' expected from edge=TRUE.
#'
#' @param tree a \code{phylo} object with internal node names 
#'             (\code{node.label} component)
#' @return a character vector of edge labels. The labels are placed
#' on the parent branch of the nodes.
node2Edge <- function(tree) {
    nodeLabs   <- tree$node.label
    childNodes <- tree$edge[, 2]
    edgeLabs   <- character(length(childNodes))
    for(i in seq_along(edgeLabs)) {
        if(tree$edge[i, 2] <= Ntip(tree))
            next
        edgeLabs[i] <- nodeLabs[tree$edge[i, 2] - Ntip(tree)]
    }
    edgeLabs
}

score 1 · Answer 2 · 2012-02-09

According to the PHYLIP FAQ this is the correct method for getting the bootstrap values and adding them to the tree. These can be added to the tree obtained using the original sequences with the same distance and tree methods. Since the bootstrap values are a measure of the confidence of the tree being correct, this will be correct. Alternatively use a different tool, see the Phylogeny Programs list on the PHYLIP site for some possibilities.

Some bootstrap programs produce a Newick format tree, which includes the bootstrap values as either node labels or comments (see MEGA description of Newick format for examples). Some tree drawing programs can use these to label the graphical representation of the tree. The Phylogeny Programs list on the PHYLIP site details a range of tree drawing tools.

score 1 · Answer 3 · 2013-06-05

I was also using phylip and ran into the problem of creating a phylogram with bootstrap values on it. While annotating the tree by hand is the official solution according to the phylip faq, this is not a good solution for me because I was using phylip in the hope of fully coding the generation of my tree figures.

In the end, I used the ape (analysis of phylogenetics and evolution) library in R for the whole phylogenetic analysis

Here is an example of how to construct and view a neighbour joining distance phylogram with bootstraps shown in R (maximum likelihood and other phylogenetic methods are also available):

library(ape)
data(woodmouse)
f <- function(x) nj(dist.dna(x))
tr <- f(woodmouse)
bs<-boot.phylo(tr, woodmouse, f, quiet = TRUE)
plot(tr)
nodelabels(bs)

To export your phylogram with bootstraps in a format which is viewable by other programs (e.g. figtree):

tr$node.label<-bs
write.tree(tr,"njwithbs.tree")

To use your own DNA sequence alignment (mydata) instead of woodmouse:

mydata<-read.dna("alcat.gfa",format="fasta")

NB I also tried the R functions posted in the other answer here, but in my case only 1 node got annotated (instead of the > 50 that I was expecting). Perhaps merging trees when using around 80 strains is too difficult because of small differences between phylip's consensus tree and the phylogram.