Question

How to calculate connectivity score for a compound in connectivity map O2?

1

Entering edit mode

9.6 years ago

Zhilong Jia ★ 2.2k

In connectivity map O2 (build 2), the connectivity score for a compound is resulted from scores of multi instances for the compound. I did not find any document about it in the original paper and the website. Thank you.

For instance, a drug H-7 with average_score 0.596, the enrichment score is 0.940, scores of four instances for H-7 are 0.629, 0.593, 0.585, 0.580. How to get 0.940 from this four scores?

connectivity-map cmap connectivity-score • 6.1k views

ADD COMMENT • link updated 21 months ago by Ram 44k • written 9.6 years ago by Zhilong Jia ★ 2.2k

score 2 · Answer 1 · 2018-12-12

For anyone else interested:

Here is a link to documentation of connectivity scores for the old CMap: https://portals.broadinstitute.org/cmap/help_topics_linkified.jsp (also nicely explained in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3868238/)

And here the link to a description of scores in new CMap, v 2.1 clue.io): https://clue.io/connectopedia/cmap_algorithms. Mind you the algorithms differ.

score 1 · Answer 2 · 2019-02-18

Let's start with the calculation of the connectivity scores (S_i):

For an instance i (i.e. a perturbagen in specific conditions - cell/dose/time), the final score S_i depends on "preliminary" scores s_i of all other instances:

For postively connected perturbagenes (instances with positive values, s_i > 0) it is divided by the value of the most positively-connected perturbagen.

S_i = s_i / (max_{k ∈ all instances}(s_k))

For negatively connected perturbagenes, it is divided by minus value of the most negatively connected one:

S_i = s_i / (-min_{k ∈ all instances}(s_k))

Where:

s_i = up_i - down_i

In your example:

S₅₉₄₁ = 0.629,
S₅₉₆₈ = 0.593,
S₅₉₆₃ = 0.585,
S₅₉₃₆ = 0.580

Enrichment score is based on permutations

The connectivity scores S_i are used to sort the list of all instances (perturbagens); if two substances have the same S_i, the one with higher up_i will be positioned higher. This gives us the rank - in your example, H-7 instances got ranks: 174, 305, 339, 368 (the higher the connectivity score, the higher the position on the list - or the lower the rank).

This list would have a total length of 6100 (the number of instances in the old CMap).

Once the ordering is ready, we can pose the following question:

are the chosen instances accumulated near the top of the sorted list of all instances?

and use Kolomogov-Smirnov (KS) statistic to asses that. A slightly simplified version would be to look at the maximum of absolute differences between:

a hypothetihcal, equal distribution along the list (let's call it j), and
the real distribution of the analyzed perturbagens (let's call it Vj)

As there are four instances considered, the distribution j would simply be:

1/4, 2/4, 3/4 and 4/4 (or [0.25, 0.50, 0.75, 1.00])

while the real distriubtion Vj of ranks is [174/6100, 305/6100, 339/6100, 368/6100], or [0.0285, 0.0500, 0.0556, 0.0603].

When we detract the two culmulative distributions (NB it is a nice property of ranks - they give us culmulative distributions) |j - Vj|, we get:

[0.2215, 0.4500, 0.6944, 0.9397]

Where maximum of those is 0.9397 ~= 0.94. This is your enrichment score!

As I mentiond earlier, this is a simplification, as the proper KS calculation would detract one when considering "negative" values. For detailed formulas, see this chapter of the documentation.

Ps. This plot may help to understand the KS: