fgsea nMoreExtreme meaning
1
1
Entering edit mode
4.5 years ago
jack.henry ▴ 50

I've been using fgsea to run some enrichment analysis and one of the outputs from fgsea() is nMoreExtreme.

I've seen from the vignette that the nMoreExtreme is:

a number of times a random gene set had a more extreme enrichment score value

I was wondering what this actually means?

Like what is a random gene set? Is it just a random selection of genes? But then would nMoreExtreme be different every time fgsea is run?

Is it another measure of how reliable the enrichment analysis is?

I am guessing that a value of 0 or 1 is good? And a value of a few thousand is bad?

Could I use it alongside a padj value to determine if the results are significant or is it not as important as padj?

Thanks in advance.

RNA-Seq gsea r fgsea • 4.5k views
ADD COMMENT
0
Entering edit mode

Hello, can I please ask what do you mean by no nperm at all in latest version of fgsea? I just installed this package in R today and got different results when running fgsea(pathways=samplepathway, stats=ranks, minSize = 15, maxSize = 500, nperm=1000, nproc=1) vs fgsea(pathways=samplepathway, stats=ranks, minSize = 15, maxSize = 500, nproc=1).

ADD REPLY
0
Entering edit mode

Please do not use the answer field unless you have an answer to the toplevel question.

?fgsea helps. The recent versions have a function fgsea which wraps both fgseaSimple and fgseaMultilevel. The former is the "traditional" and computationally expensive method based on permutations. The latter is a more efficient method based on an "adaptive multilevel splitting Monte Carlo approach" (quoted from fgseaMultilevel) which does not require permutations (from what I understand). The fgsea preprint at biorXiv explains details, and for specific question we fortunately have alserg (the maintainer of fgsea) here for expertise.

ADD REPLY
0
Entering edit mode

In the recent version fgsea has switched an algorithm that doesn't have nperm parameter. For compatibility reasons currently fgsea called with nperm parameter executes the old version, and without -- the new one. That's why you see a big difference in results.

ADD REPLY
8
Entering edit mode
4.5 years ago
alserg ▴ 980

GSEA P-values are empirically calculated by sampling random gene sets. nMoreExtreme is indeed a number of times a random gene set had a more extreme enrichment score value and is used in nominal GSEA P-value calculation, which is equal nMoreExtreme normalized by number of random gene sets that have the same sign of enrichment score. As it's random, it can have different values from run to run. If it's zero it means that the calculated P-value is bounded by the number of permutations, not the real P-value, and the result can be huge overestimation (or not).

These said, in the recent release the algorithm has be changed to fgsea-multilevel, now there are no more nMoreExtreme values and the estimation errors are reported explicitly.

ADD COMMENT
0
Entering edit mode

Thank you so much, that's interesting! Okay so if I am seeing a bunch of nMoreExtreme values of 0 then it could be because my data is significant or I don't have enough permutations? Do you know what an optimal number of permutations is? I guess it depends on your data? Is it best to leave it as the default?

ADD REPLY
1
Entering edit mode

If you do, say 10k permutations and see nMoreExtreme = 0, that could mean you can't really tell, whether the nominal P-values is 1e-4, 1e-10 or 1e-100.

Again, the optimal way is to update to the latest version of fgsea: there is no nperm parameter at all and arbitrarily low P-values can be estimated.

ADD REPLY

Login before adding your answer.

Traffic: 1873 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6