Question

Normal to have high variability in Broad Institute's GSEA Output?

1

Entering edit mode

6.2 years ago

Kristin Muench ▴ 640

I am doing a Preranked GSEA Analysis on the output of a differential expression analysis (genes ranked by sign(shrunken lfc) * -log10(pvalue) ).

I notice that when I feed the exact same files into GSEA, it can give me fairly different FDRs.

For example, when I ran the analysis the first time, it said it detected "32 gene sets are significantly enriched at FDR < 25%".

Ten minutes later, I ran the same files again, and it detected "4 gene sets are significantly enriched at FDR < 25%".

I know that there is some element of randomness in the GSEA algorithm because it sets a seed based on timepoint, but I'm surprised to see such a large difference in the number of pathways passing threshold from run to run.

Is this to be expected?

Thanks for your help.

UPDATE I also notice that the four gene sets called enriched in Round 2 are completely non-overlapping with the thirty-two gene sets enriched in Round 1. This is surprising and a little concerning to me.

GSEA GO • 3.7k views

ADD COMMENT • link 5.7 years ago by Kristin Muench ▴ 640

0

Entering edit mode

Did you run the third and fourth times? Are you sure you didn't change any settings between the first two times?

ADD REPLY • link 6.2 years ago by h.mon 35k

0

Entering edit mode

that is startling - are you sure you did not reset some other parameter to the analysis? What software are you using? MIT/Broad's Java application?

ADD REPLY • link 6.2 years ago by Malcolm.Cook ★ 1.5k

0

Entering edit mode

That is correct - I am running MIT/Broad's Java Application (64 bit 8 GB version for Mac). I imported the necessary files in once, clicked the "GSEAPreranked" tab, set up the appropriate files in the drop-down boxes, then pressed "Run" - then waited for that to run - then when it finished, pressed "Run" again - and repeated this a total of four times. Number of genes in that FDR < 25% list were:

Round 1: 32

Round 2: 4

Round 3: 15

Round 4: 55

Although the gene sets called significant were different each time, they were "thematically" similar - so, for example, I kept seeing metabolism, adherens junction, and fertilization-related gene sets.

ADD REPLY • link 6.2 years ago by Kristin Muench ▴ 640

0

Entering edit mode

What happens if you lower the FDR to 5% and run several times?

ADD REPLY • link 6.2 years ago by h.mon 35k

0

Entering edit mode

The way the GUI is set up you do not have the option to set a FDR pre-run - rather, it generates a new leading edge analysis each time, and then you set the FDA when interpreting the list/deciding what to visualize (although, in my case, the list would still be different each time, even if you do change the FDA for interpretation).

Here are a couple of Prerank analysis variables I did play with:

scoring_scheme: weighted or classic. (Both of these lead to variable gene set lists, as long as rnd_seed is set to Timepoint) (Classic may be less variable overall - in my tests, DE genes w/FDR<25% were 103, 96, 101).
rnd_seed: Timepoint, or 149: I suspect this is the culprit, because when I set the seed to 149, I get the same results every time.

It would make sense to me if the random element introduced by Timepoint caused a little shifting, but this seems very dramatic.

ADD REPLY • link 6.2 years ago by Kristin Muench ▴ 640

0

Entering edit mode

Getting the same results with the same seed is expected, it would be a bug otherwise. Get different results with different seed is also expected - the problem is, if the analysis is robust, the difference should be small.

Did you try correlating FDR or p-values from different runs?

ADD REPLY • link 6.2 years ago by h.mon 35k

1

Entering edit mode

I did! The data is puzzling because the effect seems to be of a different magnitude on FDR q-values vs. other types of metrics generated by GSEA.

Code:

# Set up dataset
varsToSave <- c('NAME', 'NOM p-val', 'FWER p-value', 'RANK AT MAX', 'FDR q-val')

plotData <- merge(round1[,names(round1) %in% varsToSave],
             round2[,names(round2) %in% varsToSave],
             by='NAME')

# Run Simple Correlation
testCor <- function(plotData, var1, var2, title){

  #plot
  plot( plotData[,var1], plotData[,var2], main=title)

  # statistical test of correlation
  myTest <- cor.test(plotData[,var1], plotData[,var2])
  print('....')
  print(title)
  print(myTest)
  print('....')
}

# Use function
testCor(plotData, "NOM p-val.x", "NOM p-val.y", 'Nominal p-value between GSEA Run 1 and 2')

testCor(plotData, "FDR q-val.x", "FDR q-val.y", 'FDR q-value between GSEA Run 1 and 2')

testCor(plotData, "RANK AT MAX.x", "RANK AT MAX.y", 'Rank at Max between GSEA Run 1 and 2')

Results:

Correlation1 t = 703.84, df = 2967, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.9967966 0.9972255 sample estimates: cor 0.9970187

Correlation 2 t = 3655400000, df = 2967, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 1 1 sample estimates: cor 1

Correlation 3 t = 339.5, df = 2967, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.9864366 0.9882444 sample estimates: cor 0.9873726

I furthermore explored which were the outlier points you see on the FDR q-value graph. For these, I observed that the difference between the q-value score in the first and the second analysis could be as high as 0.73. Some culprits:

NICOTINAMIDE NUCLEOTIDE BIOSYNTHETIC PROCESS (50 genes)
REGULATION OF CELL SHAPE%GOBP%GO:0008360 (92 genes)
PID_HIF1_TFPATHWAY%MSIGDB_C2%PID_HIF1 (61)
NEGATIVE REGULATION OF CELL CYCLE G2/M PHASE (48 genes)
INSULIN IGF PATHWAY-MITOGEN ACTIVATED PROTEIN (28 genes)

(Incidentally, these hits are a few of the 'themes' I see when I run and re-run analyses.

I'm not sure what makes these pathways different. They're not especially big. The max rank can be anywhere from 2 to 432.

I'm not sure how the rank would stay the same but the FDR q-value would change so dramatically. Any ideas?

ADD REPLY • link 6.2 years ago by Kristin Muench ▴ 640

0

Entering edit mode

Hi Kristin, did you ever follow-up on this issue or found a culprit?

ADD REPLY • link 5.7 years ago by Iván ▴ 60

1

Entering edit mode

[Moved this text to an answer - see below]

ADD REPLY • link 5.7 years ago by Kristin Muench ▴ 640

1

Entering edit mode

I suggest you move this to an answer, as it is a throughout explanation to your initial question.

ADD REPLY • link 5.7 years ago by h.mon 35k

score 4 · Accepted Answer · 2019-03-01

It seems the problem is either that I needed to run more permutations or just that I had a very weak phenotype.

Pablo Tamayo at UCSD wrote the following in response to my question on the gsea-help listserv: " If you explicitly set the seed and get the same results then that is the expected behavior. The variability across different seeds (runs) is also expected to be higher on the FRDs than other statistics. The lack of robustness you are observing can be related with not running enough permutations and can be exacerbated because the results are rather weak to begin with. Have you tried running a much larger number of permutations?"

In response to his prompt, I reran the analysis with a higher number of permutations (initial # of permutations was 1000). The pathways called significantly enriched by FDR were still very variable, but the pathways called significantly enriched by nominal p-value much less so.

I decided eventually that this likely meant I was trying to chase down a very tiny-or-nonexistent phenotype, as I have since analyzed other databases with much more obvious and stable outcomes.