Here is an R function that can merge the bootstrap values from the consensus tree with the actual branch lengths from another tree. It uses the ape package, which also includes some tree plotting functions.
The comments in the code have most of the details. I usually write my comments in roxygen format - I hope it isn't too difficult to read. It makes it easier to wrap the functions into a package later.
AFAIK, this function works correctly, but I have not tested it extensively, so you should probably manually check the output. I would be interested in hearing about any bugs you find.
#' @param bootTree The \code{phylo} object created from the
#' consensus tree (with bootstrap values in place
#' of branch lengths). This tree is usually rooted.
#' @param edge If TRUE (the default), the function returns a vector
#' of bootstrap values corresonding to the edges of
#' \code{branchTree}, otherwise a vector of bootstrap
#' values corresponding to the internal nodes. For plotting,
#' it is best to use edge=TRUE.
#' @return If edges is TRUE, a \code{character} vector with the bootstrap
#' value for each edge in \code{branchTree}. The value can be used as
#' an argument to \code{edgelabels}. Edges leading to leaves have blank
#' values. If edges is FALSE, a vector of bootstrap values for every internal
#' node in the tree (in order by the numbers given to the nodes). This can
#' be used as an argument to nodelabels in a plot or added as the \code{node.labels}
#' componenent of a \code{phylo} object. In this case, the bootstrap values
#' correspond to the parent branch of the node.
#'
#' @details The function is meant to take a \code{phylo} object created from
#' an unrooted tree with true branch lengths and a consensus tree in which the
#' bootstrap values are encoded by branch lengths. Typically the true branch length
#' tree from Phylip will be unrooted (although it may appear rooted in a plot) and the
#' consensus tree will be rooted (although it is meant to be interpreted as unrooted).
#' Bootstrap values are really properties of edges - they represent the support
#' for a particular split in the tree.
#'
#' This function works by first making a code for every internal node in the
#' tree that indicates the split represented by the branch leading to that node
#' from its parent. Using these codes, a matrix of matching nodes in the two
#' trees is generated. Then the function loops over the edges in
#' \code{branchTree}, taking the bootstrap value from the corresponding edge
#' in \code{bootTree} and skipping the branches leading to tips.
#'
#' One must be careful when changing the root or the rooted status of the
#' branchTree. The bootstrap values should be re-merged with the branchTree after
#' any manipulation that changes the topology. Caution is especially
#' important when writing the values to a file: the values can be written as
#' internal node labels, but they will not be correct if the tree is re-rooted
#' or if the tree is plotted as an unrooted tree (even though it actually is
#' unrooted). The function node2Edge converts bootstraps written as internal
#' node labels into edge labels. For plotting, it is best to use edge=TRUE.
#'
#' The split codes are generated by making a matrix with rows for taxa (at the tips)
#' and columns for the internal nodes in the tree. The taxa are in
#' alphabetical order, so that any trees with the same set of taxa can be
#' compared. Initially an entry in the matrix is TRUE if the internal node
#' contains the taxon as a child. The columns are adjusted so that the
#' minimum number of entries are TRUE (because the split AB | CDE is the
#' same as CDE | AB). Then each column is converted to hexadecimal code
#' that is easy to compare.
#'
#' @seealso \code{\link{phylo}} for an explanation of the tree data structure used here
mergeTrees <- function(branchTree, bootTree, edge=TRUE) {
require(ape)
.splitCode <- function(tree) {
pp <- prop.part(tree)
tipLabels <- attr(pp, "labels")
ordTipLabels <- order(tipLabels)
nTip <- Ntip(tree)
# Each column of out is an internal node; Each
# row is tip
out <- sapply(pp, function(x, ntip) {
ret <- numeric(ntip)
ret[ordTipLabels[x]] <- 1
as.logical(ret)
}, nTip)
out <- apply(out, 2, function(x) {
s <- sum(x)
l <- length(x) / 2
if(s > l)
!x
else if(s == l && x[1])
!x
else
x
})
nFillerBits <- ceiling(nrow(out) / 8) * 8 - nrow(out)
out <- rbind(out, matrix(nrow=nFillerBits, ncol=ncol(out), data=FALSE))
out <- apply(out, 2, function(x) paste(packBits(x), collapse=""))
attr(out, "labels") <- tipLabels[ordTipLabels]
cbind(1:length(pp) + Ntip(tree), out)
}
brSplits <- .splitCode(branchTree)
boSplits <- .splitCode(bootTree)
brSplits <- brSplits[brSplits[, 2] %in% boSplits[, 2], ]
boSplits <- boSplits[boSplits[, 2] %in% brSplits[, 2], ]
nodeNums <- cbind(brSplits[match(boSplits[, 2], brSplits[, 2]), 1],
boSplits[, 1])
storage.mode(nodeNums) <- "numeric"
if(edge)
bootstrap <- character(length(branchTree$edge.length))
else
bootstrap <- character(Nnode(branchTree))
for(i in seq_along(bootstrap)) {
if(edge)
brNode <- branchTree$edge[i, 2]
else
brNode <- i + Ntip(branchTree)
boNode <- nodeNums[nodeNums[, 1] == brNode, 2]
if(!length(boNode))
next
bootVal <- as.character(bootTree$edge.length[bootTree$edge[, 2] == boNode])
if(length(bootVal))
bootstrap[i] <- bootVal
}
bootstrap
}
And here are some examples of how to use it:
# Load ape and read in some data:
# Note that read.tree reads Newick format files.
library(ape)
br <- read.tree("/path/to/tree/with/branch_lengths")
bo <- read.tree("/path/to/consensus/tree/with/bootstraps")
# Now merge trees and make a plot
# Remember, if you want to root br, do that before running mergeTrees
bootstraps <- mergeTrees(br, bo, TRUE)
plot(br)
edgelabels(bootstraps, frame="none", adj=c(0.5, 0))
# An unrooted plot
plot(br, type="unrooted")
edgelabels(bootstraps, frame="none")
# Write the bootstrap values to a file as branch labels
bootstraps <- mergeTrees(br, bo, FALSE)
br$node.names <- bootstraps
write.tree(br, file="mytree.phy")
# Now read the tree back in and convert the node labels to
# edge labels for plotting. Note that if you root the tree,
# you will have to use mergeTrees again
mytree <- read.tree("mytree.phy")
bootstraps <- node2Edge(mytree)
plot(mytree); edgelabels(bootstraps)
#' Convert node labels to edge labels
#'
#' Makes a vector of edge labels from internal node names. Turns
#' the output of mergeTrees with edge=FALSE into the output
#' expected from edge=TRUE.
#'
#' @param tree a \code{phylo} object with internal node names
#' (\code{node.label} component)
#' @return a character vector of edge labels. The labels are placed
#' on the parent branch of the nodes.
node2Edge <- function(tree) {
nodeLabs <- tree$node.label
childNodes <- tree$edge[, 2]
edgeLabs <- character(length(childNodes))
for(i in seq_along(edgeLabs)) {
if(tree$edge[i, 2] <= Ntip(tree))
next
edgeLabs[i] <- nodeLabs[tree$edge[i, 2] - Ntip(tree)]
}
edgeLabs
}
Hey Panos, you've really answered your own question. It's fine to add support values with illustrator/inkscape but the topologies may differ (which is why consense doesn't give branch lengths)
You can also plot node support for the consensus in TreeView or APE. I HATE doing it by hand!