How interpret the Enrichment Score coming ,for e.g., from the GSEA function in R?. Exactly is better 0 or 1? If I have for e.g. -0.17, +0.80 what is exactly the biological interpretation for the pathways with these specific ES? Finally, is correct or an error to have some pathways with an ES = 84, 11 and 238?
The sign on the score simply indicates when end of your ranked gene list is enriched. You provide the rank list of genes, so the biiological interpretation is up to you. If, for example, you provide a gene list ranked by a combination of fold change and p-value (e.g., sign(FC) * log10(pvalue)), then the positive scores are associated with upregulated genes and negative scores are associated with downregulated genes. Caution: some tools reverse this, so manually check a few to see which convention they are using.
ES values range from -1 to 1. Normalized ES values will go a bit beyond these bounds. If you are seeing ES values > 1, then I would suspect something is wrong.
Additional comment: You probably want to look at NES (Normalized ES) if that is provided by the tool you are using. The first two points above apply just the same.
log10(pvalue) is a negative value. Therefore in order for positive scores to be associated with upregulated genes (and negative scores to be associated with downregulated genes), one needs to rank the gene list be sign(FC)*(-log10(pvalue))
Gene Set Enrichment Analysis (GSEA) is an analytical method to interpret gene expression data.
Algorithm:
Consider a list (say L) in which genes are ordered according to some measure of correlation. The aim of GSEA is to decide if a gene set will in general happen towards the lower or top part of the ordered list L. The entire ranked list(L) is used to assess how the genes of each gene set are distributed across the ranked list. To do this, GSEA walks down the ranked list of genes, increasing a running-sum statistic when a gene belongs to the set and decreasing it when the gene does not.
The enrichment score (ES) is the maximum deviation from zero encountered during that walk.
ES is the maximum sum over the list L.
Interpretation of ES value:
Higher the ES score, it more likely for a gene set to shift towards either end of the ranked list L.
ES is a standard Kolmogorov Smirnov statistic, where p(a tuning parameter) = 0 means the fit is good and p = 1 means the fit is not good.
Normalized Enrichment Score lies [0,1]. The positive and negative values indicated the correlation between gene sets and expression data set.
log10(pvalue)
is a negative value. Therefore in order for positive scores to be associated with upregulated genes (and negative scores to be associated with downregulated genes), one needs to rank the gene list besign(FC)*(-log10(pvalue))