Question

Dealing with very large gene-lists in GSEA

0

Entering edit mode

22 months ago

yura.grabovska ▴ 690

I'm using fgsea to do gene-set enrichment using the ENCODE transcription factor targets dataset.

However, some of the gene lists are very large and I suspect this is causing my gene-set enrichment to fail to find many significant enrichments due to how the normalisation step works. What is the most appropriate way to systematically deal with very large gene-lists in GSEA?

From the GSEA User Guide: "Nevertheless, the normalization is not very accurate for extremely small or extremely large gene sets. For example, for gene sets with fewer than 10 genes, just 2 or 3 genes can generate significant results. Therefore, by default, GSEA ignores gene sets that contain fewer than 15 genes or more than 500 genes"

R fgsea GSEA • 638 views

ADD COMMENT • link updated 22 months ago by Trivas ★ 1.8k • written 22 months ago by yura.grabovska ▴ 690

score 1 · Answer 1 · 2023-03-29

IMO, in those cases you could look at ES instead of NES. Regardless, within the function fgsea, you can set the parameters minSize and maxSize. From the "quick guide" on github it shows the recommended parameters to be 15 and 500 like you mentioned.

fgseaRes <- fgsea(pathways = examplePathways, 
                  stats    = exampleRanks,
                  minSize  = 15,
                  maxSize  = 500)

However, if you look at the help documentation within R (e.g. ?fgsea) you see:

fgsea(
  pathways,
  stats,
  minSize = 1,
  maxSize = length(stats) - 1,
  gseaParam = 1,
  ...
)

Meaning if you run fgsea without changing any parameters, it will show gene sets from size 1 to the number of genes you have stat values for.