I had thought I understood the meaning of the positive and negative normalized enrichment scores (NES) produced by GSEA, but it just occurred to me I don't know what to expect from a gene set that is enriched both at high and low ranks. For example, some gene sets (e.g., GO biological processes) contain genes that are co-regulated under different conditions, but actually move in opposite directions (producing a mix of positive and negative fold-changes), such that these genes fall at the very top and bottom of the rank-ordered list that GSEA is using to compute NES. So, do these high- and low-ranking genes just cancel each other out leading to an NES close to zero or does GSEA just report the larger absolute enrichment between the positive and the negative NES?
Thanks for the quick response Shawn! Your explanation makes sense. So, what do people typically do to capture enriched gene sets where the genes in the set tend to change under the same conditions, but actually contain a mix of up- and down-regulated genes? The first thing that comes to mind is to run the analysis with ranks that are determined by the fold-change p-values that have been signed in the direction of the fold-change (which is what I've been doing), but then also run a second analysis where ranks are determined only by the fold-change p-values. It seems a little clunky though to have to run 2 separate analyses, no?
Rather than changing the sign of the p-value it's recommended to use the T-statistic. This number correlates to the p-value and is signed based on the fold change.
My recommendation would be to perform a Hypergeometric test (similar to a GO term enrichment but using the gene sets of interest). I'd take the significantly up-regulated genes and run the test, then take the down genes and run the test. In this case if you see the same gene set being found as significant in both tests you'd have your answer.
Makes sense. Thanks Shawn!