Question

How to decide what gene set libraries to use in your GSEA analysis

0

Entering edit mode

18 months ago

Jen ▴ 100

This is probably more of a philosophical question. I'm sure you could probably choose whichever library you want (i.e. Hallmark, Reactome, Kegg etc.), but I'm curious whether there are known benefits of using one (or a few) over others. Are there certain circumstances where one (or a combination) would be better than another? For instance, I'd imagine if you are answering a question relating to cancer then you would probably use Hallmark. Another question is how do you interpret a process coming up as enriched using one library but the same process not coming up when using another. I know the gene sets are not identical but is there a better way to interpret this?

I am using GSEA to identify what processes are happening in my cells. GO BP is too broad and returns too much stuff to be specific. Currently I'm using Kegg, Hallmark and maybe Reactome, but Reactome also returns a lot of stuff. This is what is ultimately leading me to ask this question. Do you just pull out results that help you tell your story? Seems very arbitrary to me.

GSEA • 2.0k views

ADD COMMENT • link updated 18 months ago by e.r.zakiev ▴ 250 • written 18 months ago by Jen ▴ 100

1

Entering edit mode

I often see in cancer papers H, C2 and C5 sets from MSigDB being tested against, by the way. But in general geneset enrichment test are more of a smoke and mirrors situation rather than hard science and always should be validated in the wet lab. There are lots of discussions on this forum regarding that, but of particular note in my opinion would be this one

ADD REPLY • link 18 months ago by e.r.zakiev ▴ 250

score 1 · Answer 1 · 2023-10-05

GO BP is indeed too broad and redundant - the same geneset might be a hit for a term AND for its parent(s). Especially if use simple overrepresentation test, so it's good that you use GSEA, as it takes into account the rank of genes in your genelist.

There are databases that try to reduce the clutter by removing redundant terms, of note PANTHER DB and its "slim" GO ontologies, as well as GOslims

And yes, it's a normal practice to "pull out results that help you tell your story", but they need to be validated in the wet lab, otherwise they are worthless.