Hi, new user here. I originally posted this question on http://www.reddit.com/r/bioinformatics/ but it was suggested that I should ask here as well.
I'm currently implementing an algorithm for calculating the likelihood of a given phylogenetic tree. I'm reading chapter 16 in Joseph Felsenstein's Inferring Phylogenies which explains this rather nicely, but I am stumped by one thing and would like to hear if you can help me.
Given the following tree, T,
x|
/\
/ \
/ \
t6/ \t8
/ \
y/ /\t7
/\ t3/ \
t1/ \t2 / t4/\t5
/ \ / / \
A C C C G
and an assumption that evolution in different lineages is independent, we get the following:
P(A,C,C,C,G,x,y,z,w|T) = P(x)×P(y|x,t6)×P(A|y,t1)×…×P(G|w,t5),
i.e. each probability is only dependent on what's above it and the branch length. Ok, so far, so good. But then he writes:
The probability of x may be taken to be the probability that, at a random point on an evolving lineage, we would see base x (where x = A, C, G, or T). If we are allowed to assume that evolution has been proceeding for a very long time according to the particular model of base substitution that we are using, it is reasonable to take P(x) to be the equilibrium probability of base x under that model. The other probabilities are derived from the model of base substitution.
What I really don't understand is the text I highlighted in bold. (The rest was included for context.) Why does he suddenly talk about “a random point on an evolving lineage” when the text is talking about the tree shown above? Am I simply reading to much into his formulation or have I missed anything?
It was suggested that “the probability of x” may simply refer to any of the P(…) terms and not to the hypothetical state in the tree above, but I still find the phrasing weird.
nicely explained
thanks! that makes sense. :-)