Question

Filtering relevant Gene Ontology (GO) results from Gene Set Enrichment Analysis (GSEA)

0

Entering edit mode

3.2 years ago

Ayesha • 0

Hi all,

I am new to bioinformatics and am currently learning how to use GSEA.

Background: I analyzed my RNA-Seq results using DESeq2, and am now learning to perform GSEA. For my project, in broad terms, I have samples from sick patients and healthy patients. My plan was to perform GSEA to identify enriched gene sets in the sick patients, and then perform Leading Edge analysis to view genes that are present across many of the enriched gene sets. I am particularly interested in gene sets/genes having to do with immune responses. The MSigDB I used was BP of GO.

As I am working with this program for the first time, I am stumped about two things:

The GSEA analysis came back with "888 gene sets are significantly enriched at nominal pvalue < 1%" and "1815 gene sets are significant at FDR < 25%". And so I am a little bit overwhelmed with the volume of data. In this type of analysis, is it sufficient to look at and work with, for example, the top 50 enriched gene sets and continue my Leading Edge analysis with those? -> my guess is no and that this would lead me to miss out on potentially interesting results ?
Looking at my top 20 enriched gene sets, for example, there are a number of gene sets that pertain to my experiment and what I am interested in, such as those having to do with pattern recognition receptor signaling, TLR signaling, antigen processing, etc. In addition, there are also highly enriched gene sets such as golgi vesicle transport, ER to golgi mediated transport, vesicle targeting which I am less interested in as they have less to do with immune responses. Is there a method to filter my GO results for ones having to do with immune response, and perform Leading Edge analysis on that filtered subset?

I guess what it boils down to is - I am overwhelmed with the # of enriched gene sets and volume of data, and am unsure of where to go next in my analysis! Ideally, I would like to narrow down my list of DEGs to a few genes that I could explore further for their role in disease pathophysiology.

I would appreciate any help/suggestions/advice! I hope my question was clear - I am still new to bioinformatics and am not always certain about the terminology and stuff :)

GSEA leadingedgeanalysis DESeq2 • 2.4k views

ADD COMMENT • link updated 3.2 years ago by rodolfo.peacewalker ▴ 390 • written 3.2 years ago by Ayesha • 0

0

Entering edit mode

Hi!

In this case there are some strategies to solve your issues.

First, if you want to study gene sets related to immune response, you are able to create your own gmt files only selecting those gene sets associated with the immune response. Then, you will reduce the number of gene sets.

Second, I suggest you to perform your analysis using R packages such as fGSEA. Running GSEA with this package will help you to perform leading edge analysis following some advices form this post.

Also, I have question, what is your input data for GSEA? Genes passing abundance filter or genes obtained from differential expression analysis?

Best regards

ADD REPLY • link 3.2 years ago by rodolfo.peacewalker ▴ 390

score 0 · Answer 1 · 2021-09-04

I don't think anyone would study ALL the enriched genes whether the number was 800 or 100. In fact, pretty sure that would still hold even if the number was 30.

I think the question is whether you want to pick a limited number of genes to study based on objective statistical criteria, or whether you have a preference towards a certain group (immune response). If you want to do it objectively, setting the FDR to a smaller number than 25% (say 5%) will increase the confidence, and also reduce the number of enriched genes. The same will be the case if you pick smaller p-values, say 0.005 or even 0.001. But no matter what cutoffs you pick, chances are that there will be more genes than one could possibly study, yet the larger number of genes may help you see the pattern better (with GO or otherwise). My suggestions to you is not to worry too much about the final number. You report all of them, and then you may want to pick only a subset because they are most interesting to you.