Let's start with the calculation of the connectivity scores (Si):
For an instance i (i.e. a perturbagen in specific conditions - cell/dose/time), the final score Si depends on "preliminary" scores si of all other instances:
For postively connected perturbagenes (instances with positive values, si > 0) it is divided by the value of the most positively-connected perturbagen.
Si = si / (maxk ∈ all instances(sk))
For negatively connected perturbagenes, it is divided by minus value of the most negatively connected one:
Si = si / (-mink ∈ all instances(sk))
Where:
si = upi - downi
In your example:
- S5941 = 0.629,
- S5968 = 0.593,
- S5963 = 0.585,
- S5936 = 0.580
Enrichment score is based on permutations
The connectivity scores Si are used to sort the list of all instances (perturbagens); if two substances have the same Si, the one with higher upi will be positioned higher. This gives us the rank - in your example, H-7 instances got ranks: 174, 305, 339, 368 (the higher the connectivity score, the higher the position on the list - or the lower the rank).
This list would have a total length of 6100 (the number of instances in the old CMap).
Once the ordering is ready, we can pose the following question:
are the chosen instances accumulated near the top of the sorted list of all instances?
and use Kolomogov-Smirnov (KS) statistic to asses that. A slightly simplified version would be to look at the maximum of absolute differences between:
- a hypothetihcal, equal distribution along the list (let's call it j), and
- the real distribution of the analyzed perturbagens (let's call it Vj)
As there are four instances considered, the distribution j would simply be:
1/4, 2/4, 3/4 and 4/4 (or [0.25, 0.50, 0.75, 1.00])
while the real distriubtion Vj of ranks is [174/6100, 305/6100, 339/6100, 368/6100], or [0.0285, 0.0500, 0.0556, 0.0603].
When we detract the two culmulative distributions (NB it is a nice property of ranks - they give us culmulative distributions) |j - Vj|, we get:
[0.2215, 0.4500, 0.6944, 0.9397]
Where maximum of those is 0.9397 ~= 0.94. This is your enrichment score!
As I mentiond earlier, this is a simplification, as the proper KS calculation would detract one when considering "negative" values. For detailed formulas, see this chapter of the documentation.
Ps. This plot may help to understand the KS:
Do you happen to know what happens if we score a query signature with only one sign for all genes? So that the calculation for what you refer to as up_i (and the authors refer to as ks_i) is not possible to do in a signature of negatives for example. Simply substitute zero? To clarify in the language of the original paper, the up tag list would be empty in such a case, hence the a and b calculations for it not possible.