Hi,
I ran into a little issue when attempting to do a Pre-Ranked GSEA Analysis on my gene list (Created by DESeq2). Specifically, when ranking my list of genes for GSEA based off of log2FoldChange, I got a large number of enriched gene sets that seemed fine. However, I also wanted to rank the gene list based off of the statistic sign(log2FoldChange) * -log10(Pvalue) to verify my other GSEA results and to take also statistical significance into account (The Pvalue).
When doing this, I first used the Padj (Adjusted P-values) for the genes (Output from DESeq2) as the "pvalue" in the ranking and got no enriched gene sets with an error of 53.68% of all genes being ties: "There are ties in the preranked stats (53.68% of the list)." When I switched to using raw Pvalues, the error went away and got a large number of enriched gene sets again.
In this case, is it alright to use direct Pvalues instead of Padj values and why? Furthermore, why are most of the padj values overlapping?
Thanks!