Question

Clustering of trinary variable for drug-target similarity

0

Entering edit mode

7.0 years ago

kakukeshi ▴ 80

Hi guys,

I have a 2D matrix of drug-target relationship where each value represents the action of the drug in the target. Let say 1 is activation, -1 is inhibition and 0 is unknown (or no effect). What strategy would you recommend me to cluster the drugs by their similarity in this matrix? Here is an example:

                 Targets
          ProtA   ProtB   ProtC
Drug1       1       0       1       
Drug2       0       1       0             
Drug3       0      -1      -1

In this case, Drug1 is activating protein A and C. Drug2 is activating protein 2 and Drug3 is inhibiting protein B and C

clustering R • 1.2k views

ADD COMMENT • link updated 7.0 years ago by Jean-Karim Heriche 27k • written 7.0 years ago by kakukeshi ▴ 80

score 0 · Answer 1 · 2017-11-28

This is categorical data so you could look into similarity measures for categorical data such as the overlap measure, i.e. the number of proteins on which the two drugs being compared have the same effect. There are ways of normalizing this such as Cohen's kappa. You could also look into measures based on entropy. Gower's coefficient and TF-IDF are also worth considering. Which measure to choose is often problem-dependent. For example, should activation and inhibition have the same weight, should 0 be ignored ...