I am doing a Preranked GSEA Analysis on the output of a differential expression analysis (genes ranked by sign(shrunken lfc) * -log10(pvalue) ).
I notice that when I feed the exact same files into GSEA, it can give me fairly different FDRs.
For example, when I ran the analysis the first time, it said it detected "32 gene sets are significantly enriched at FDR < 25%".
Ten minutes later, I ran the same files again, and it detected "4 gene sets are significantly enriched at FDR < 25%".
I know that there is some element of randomness in the GSEA algorithm because it sets a seed based on timepoint, but I'm surprised to see such a large difference in the number of pathways passing threshold from run to run.
Is this to be expected?
Thanks for your help.
UPDATE I also notice that the four gene sets called enriched in Round 2 are completely non-overlapping with the thirty-two gene sets enriched in Round 1. This is surprising and a little concerning to me.
Did you run the third and fourth times? Are you sure you didn't change any settings between the first two times?
that is startling - are you sure you did not reset some other parameter to the analysis? What software are you using? MIT/Broad's Java application?
That is correct - I am running MIT/Broad's Java Application (64 bit 8 GB version for Mac). I imported the necessary files in once, clicked the "GSEAPreranked" tab, set up the appropriate files in the drop-down boxes, then pressed "Run" - then waited for that to run - then when it finished, pressed "Run" again - and repeated this a total of four times. Number of genes in that FDR < 25% list were:
Round 1: 32
Round 2: 4
Round 3: 15
Round 4: 55
Although the gene sets called significant were different each time, they were "thematically" similar - so, for example, I kept seeing metabolism, adherens junction, and fertilization-related gene sets.
What happens if you lower the FDR to 5% and run several times?
The way the GUI is set up you do not have the option to set a FDR pre-run - rather, it generates a new leading edge analysis each time, and then you set the FDA when interpreting the list/deciding what to visualize (although, in my case, the list would still be different each time, even if you do change the FDA for interpretation).
Here are a couple of Prerank analysis variables I did play with:
scoring_scheme: weighted or classic. (Both of these lead to variable gene set lists, as long as rnd_seed is set to Timepoint) (Classic may be less variable overall - in my tests, DE genes w/FDR<25% were 103, 96, 101).
rnd_seed: Timepoint, or 149: I suspect this is the culprit, because when I set the seed to 149, I get the same results every time.
It would make sense to me if the random element introduced by Timepoint caused a little shifting, but this seems very dramatic.
Getting the same results with the same seed is expected, it would be a bug otherwise. Get different results with different seed is also expected - the problem is, if the analysis is robust, the difference should be small.
Did you try correlating FDR or p-values from different runs?
I did! The data is puzzling because the effect seems to be of a different magnitude on FDR q-values vs. other types of metrics generated by GSEA.
Code:
Results:
t = 703.84, df = 2967, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.9967966 0.9972255 sample estimates: cor 0.9970187
t = 3655400000, df = 2967, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 1 1 sample estimates: cor 1
t = 339.5, df = 2967, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.9864366 0.9882444 sample estimates: cor 0.9873726
I furthermore explored which were the outlier points you see on the FDR q-value graph. For these, I observed that the difference between the q-value score in the first and the second analysis could be as high as 0.73. Some culprits:
(Incidentally, these hits are a few of the 'themes' I see when I run and re-run analyses.
I'm not sure what makes these pathways different. They're not especially big. The max rank can be anywhere from 2 to 432.
I'm not sure how the rank would stay the same but the FDR q-value would change so dramatically. Any ideas?
Hi Kristin, did you ever follow-up on this issue or found a culprit?
[Moved this text to an answer - see below]
I suggest you move this to an answer, as it is a throughout explanation to your initial question.