How to calculate connectivity score for a compound in connectivity map O2?
2
1
Entering edit mode
9.6 years ago
Zhilong Jia ★ 2.2k

In connectivity map O2 (build 2), the connectivity score for a compound is resulted from scores of multi instances for the compound. I did not find any document about it in the original paper and the website. Thank you.

For instance, a drug H-7 with average_score 0.596, the enrichment score is 0.940, scores of four instances for H-7 are 0.629, 0.593, 0.585, 0.580. How to get 0.940 from this four scores?

connectivity-map cmap connectivity-score • 6.1k views
ADD COMMENT
2
Entering edit mode
6.0 years ago

For anyone else interested:

Here is a link to documentation of connectivity scores for the old CMap: https://portals.broadinstitute.org/cmap/help_topics_linkified.jsp (also nicely explained in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3868238/)

And here the link to a description of scores in new CMap, v 2.1 clue.io): https://clue.io/connectopedia/cmap_algorithms. Mind you the algorithms differ.

ADD COMMENT
1
Entering edit mode
5.8 years ago

Let's start with the calculation of the connectivity scores (Si):

For an instance i (i.e. a perturbagen in specific conditions - cell/dose/time), the final score Si depends on "preliminary" scores si of all other instances:

For postively connected perturbagenes (instances with positive values, si > 0) it is divided by the value of the most positively-connected perturbagen.

Si = si / (maxkall instances(sk))

For negatively connected perturbagenes, it is divided by minus value of the most negatively connected one:

Si = si / (-minkall instances(sk))

Where:

si = upi - downi

In your example:

  • S5941 = 0.629,
  • S5968 = 0.593,
  • S5963 = 0.585,
  • S5936 = 0.580

Enrichment score is based on permutations

The connectivity scores Si are used to sort the list of all instances (perturbagens); if two substances have the same Si, the one with higher upi will be positioned higher. This gives us the rank - in your example, H-7 instances got ranks: 174, 305, 339, 368 (the higher the connectivity score, the higher the position on the list - or the lower the rank).

This list would have a total length of 6100 (the number of instances in the old CMap).

Once the ordering is ready, we can pose the following question:

are the chosen instances accumulated near the top of the sorted list of all instances?

and use Kolomogov-Smirnov (KS) statistic to asses that. A slightly simplified version would be to look at the maximum of absolute differences between:

  • a hypothetihcal, equal distribution along the list (let's call it j), and
  • the real distribution of the analyzed perturbagens (let's call it Vj)

As there are four instances considered, the distribution j would simply be:

1/4, 2/4, 3/4 and 4/4 (or [0.25, 0.50, 0.75, 1.00])

while the real distriubtion Vj of ranks is [174/6100, 305/6100, 339/6100, 368/6100], or [0.0285, 0.0500, 0.0556, 0.0603].

When we detract the two culmulative distributions (NB it is a nice property of ranks - they give us culmulative distributions) |j - Vj|, we get:

[0.2215, 0.4500, 0.6944, 0.9397]

Where maximum of those is 0.9397 ~= 0.94. This is your enrichment score!

As I mentiond earlier, this is a simplification, as the proper KS calculation would detract one when considering "negative" values. For detailed formulas, see this chapter of the documentation.

Ps. This plot may help to understand the KS:

https://i.imgur.com/TpzMcJE.png

ADD COMMENT
0
Entering edit mode

Do you happen to know what happens if we score a query signature with only one sign for all genes? So that the calculation for what you refer to as up_i (and the authors refer to as ks_i) is not possible to do in a signature of negatives for example. Simply substitute zero? To clarify in the language of the original paper, the up tag list would be empty in such a case, hence the a and b calculations for it not possible.

ADD REPLY

Login before adding your answer.

Traffic: 1659 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6