Question

Compute Affinity Matrix From Distance Matrix

2

Entering edit mode

12.3 years ago

ericmajinglong ▴ 120

Hi guys,

I used clustal omega to get a distance matrix of 500 protein sequences (they are homologous to each other).

I want to use affinity propagation to cluster these sequences.

Initially, because I observed by hand that the distance matrix only had values between 0 and 1, with 0 distance = 100% identity, I reasoned that I could just take (1 - distance) to get affinity.

I ran my code, and the clusters looked reasonable, and I thought all was well... until I read that typically, affinity matrices are calculated from distance matrices by applying a "heat kernel". That's when all hell broke loose in my mind.

Did I get the concept of affinity matrix incorrect? Is there an easy way of computing the affinity matrix? scikit-learn offers the following formula:

similarity = np.exp(-beta * distance / distance.std())

But what is beta, and what is distance.std()?

I'm quite confused and lost right now with the concepts involved (as opposed to the actual coding implementation), so any help is greatly appreciated!

clustalw protein multiple-alignment distance • 12k views

ADD COMMENT • link updated 3.4 years ago by Ram 45k • written 12.3 years ago by ericmajinglong ▴ 120

Ram · Answer 1 · 2014-12-19

0

Entering edit mode

10.6 years ago

learnBioinformatics ▴ 60

In R, it may be calculated:

For example, you have a matrix, saying A, which 200 * 300. [200 sequences, and each sequence is presented by 300 features]

library(fields) # fast way to calculate the distance
dist <- (rdist(A))^2 # dist is 200 * 200 distance matrix
t <- mean(dist)
simMat <- exp(-dist / (2 * t^2))

Maybe, here simMat is what you want.

Hope this helps.

ADD COMMENT • link updated 3.4 years ago by Ram 45k • written 10.6 years ago by learnBioinformatics ▴ 60

Ram · Answer 2 · 2014-12-22

An affinity matrix is simply a similarity matrix used as input to the affinity propagation algorithm. From http://www.psi.toronto.edu/index.php?q=affinity%20propagation:

Affinity propagation ... takes as input measures of similarity between pairs of data points ...

In the context of clustering, a similarity measure is just the converse of a distance i.e. a distance of 0 means highest similarity. If your distance metric d is between 0 and 1 then s = 1 - d is a valid similarity measure. You can also convert a distance into a similarity using a radial basis function (a.k.a Gaussian/heat kernel).