Neighbor Joining algorithm: why n-2?
1
0
Entering edit mode
6.9 years ago
michael • 0

I have been looking into how the neighbor joining algorithm works and I have two questions about the first step, i.e. computing the matrix Q using the following formula:

Q(i,j) = (n - 2)d(i,j) - Σd(i,k) - Σd(j,k)

Where both summations are from k = 1 to n, n is the number of taxons, d() is the distance between two elements and i and j are specific taxons.

From my understanding this formula creates a new matrix containing the distance between leaves i and j multiplied by (n-2). From this value we subtract the total distance from i to all other nodes and the total distance from j to all other nodes. This gives us a sense of how far away any given pair of nodes is from all other clusters.

I think we are using the term (n-2) as a scaling factor to ensure that the value calculated by the two sums does not dominate the result. If this is the case, why is it n-2 and not n-1 or just n?

Finally, I have also encountered the following formula for Q:

Q(i,j) = d(i,j) - 1/(n-2)Σd(i,k) - 1/(n-2)Σd(j,k)

Are these two formulas equivalent? If so why?

Thank you.

neighbor-joining phylogenetics evolution • 2.5k views
ADD COMMENT
0
Entering edit mode

This is not a scaling factor, it comes from doing the sum over all leaves except i and j.

ADD REPLY
2
Entering edit mode
6.9 years ago
kloetzl ★ 1.1k

All of my references have a different notation, but I will try to answer anyway.

If this is the case, why is it n-2 and not n-1 or just n?

You kind of answered the question yourself in the paragraph above. You want (try to) to fuse two nodes i and j into a new node p. By doing that you remove one edge from the tree, hence n-1. Knock off another -1 for each "self distance" i.e. value of 0 in the sum, thus n-2.

Are these two formulas equivalent? If so why?

They differ only by a factor of n-2. However, what it most important is to join the nodes with the minimum distance; and that doesn't change.

ADD COMMENT

Login before adding your answer.

Traffic: 2041 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6