Question

Gene set enrichment analysis gives a lot of "significant pathways"

0

Entering edit mode

7 weeks ago

Bine ▴ 90

Good morning,

I was wondering if someone could advice me on the following: I have performed a GSEA in KEGG_LEGACY, Reactome and Hallmarks using FGSEA.

I receive a lot of significant pathways as hits:

E.g. in Hallmarks out of all pathways (I think 50 in total) most of them are siginificant (only around 20 are not significant).
In KEGG_LEGACY and Reactome there are also a lot of significant pathways...

I dont know if this is possible and if so is there a way to make it a bit easier to interpret? How can I make sense of all this?

Thanks a lot!

KEGG GESA Hallmarks • 573 views

ADD COMMENT • link 6 weeks ago by Bine ▴ 90

1

Entering edit mode

What is the dataset you are analysing and what statistic are you using as input? Also what is your cutoff for significance?

ADD REPLY • link 7 weeks ago by yura.grabovska ▴ 690

0

Entering edit mode

Good afternoon, Thank you for your answer. My dataset is TCGA colorectal cancer and I am using the stat parameter (which I received from my DESEQ2 analysis) as input for FGSEA. My cutoff for significance is adjusted p-value of 0.05.

Please also find my code below:

# Prepare results from DESEQ2
res<- res[order(-res$stat),]
ranks<-res$stat
names(ranks)<-res$hgnc_symbol

#Prepare pathways
....

# Run GESA
fgseaRes <- fgsea(pathways=pathways.reactome, stats=ranks)

Thank you!

ADD REPLY • link 7 weeks ago by Bine ▴ 90

1

Entering edit mode

I am not sure if DESeq2 is good when you have hundreds of samples to compare in each group. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-022-02648-4

ADD REPLY • link 7 weeks ago by Ming Tommy Tang ★ 4.5k

1

Entering edit mode

FWIW this issue wouldn't affect GSEA per-se. The guidelines for running GSEA are to run the whole dataset ranked by a statistic (eg LFC) rather than pre-filter by significance - GSEA does its own permutation testing. In that sense, over convervative FDR in your DE analysis shouldn't affect downstream GSEA results unless you're prefiltering results specifically, in which case you're already biasing your GSEA output.

ADD REPLY • link 7 weeks ago by yura.grabovska ▴ 690

0

Entering edit mode

I ran it on all 18.000 genes (not only the significant ones). The total list was 30.000 or so, but I did some prefiltering of this data according to DESE2 pipeline:

smallestGroupSize <- 66 #smallest group size
keep <- rowSums(counts(dds0) >= 10) >= smallestGroupSize
dds0 <- dds0[keep,]

ADD REPLY • link 6 weeks ago by Bine ▴ 90