Dear all, I have been running GSEA, since I am very new to GSEA, I am poking around both R package clusterprofiler and the original GSEA software(GSEA_4.3.2). However while I was running the same pre-ranked gene list, I got different results from three methods (gsea, gseaGO from clusterprofiler, GSEAPreranked from broad institute GSEA_4.3.2)
With clusterprofiler, I did similar tests as this post, I used both gsea and gseaGO function on a pre-ranked list.
m5_gs <- read.gmt("m5.go.v2022.1.Mm.symbols.gmt")
gsea_res <- GSEA(gene_list,
minGSSize = 5,
maxGSSize = 800,
TERM2GENE = m5_gs,
pvalueCutoff = 0.25,
pAdjustMethod = "BH")
gseaGO_res <- gseGO(geneList=gene_list,
ont ="ALL",
keyType = "SYMBOL",
minGSSize = 5,
maxGSSize = 800,
pvalueCutoff = 0.25,
verbose = TRUE,
OrgDb = org.Mm.eg.db,
pAdjustMethod = "BH")
And lastly, I use GSEA software with its GSEAPreranked function, the parameters are below
param set_min 5
param scoring_scheme weighted
param norm meandiv
param set_max 800
param gmx ftp.broadinstitute.org://pub/gsea/msigdb/mouse/gene_sets/m5.go.v2022.1.Mm.symbols.gmt
param nperm 5000
GSEAPreranked only has 1 gene set enriched at FDR < 25%, while gsea function from clusterprofiler gave me 10, and gesaGO gave me 36 gene sets, I tried to dig a little deeper about the gene sets they are using, I found out that both GSEAPreranked and gsea using the same 3460 gene sets while gseaGO was using 4332 gene sets. The three produce same ES, and very similar NES, yet the p-value are quite different. I learnt from the post I mentioned above, the discrepancy between gseaGO and gsea were the result of difference in GO gene sets, but how to explain the discrepancy between gsea and GSEAPreranked? Also which method is more suitable for the analysis?
Thank you very much for your time and help!