Question

How to use DEGs file for GSEA?

1

Entering edit mode

3.9 years ago

fifty_fifty ▴ 70

I want to run GSEA on my DEGs from scRNA-seq analysis, which contains gene name, logFC, p-value, adjusted p-value data. However, in the Broad Institute GSEA tutorial on how to format input files, their file contains gene expression across multiple samples but not DEGs.

Is there any way to use DEGs input in GSEA software? If not, are there any other gsea tools that can calculate enrichment scores using DEGs data?

gsea RNA-Seq scRNA-seq Seurat • 3.4k views

ADD COMMENT • link updated 3.5 years ago by yeshiwork2000 • 0 • written 3.9 years ago by fifty_fifty ▴ 70

0

Entering edit mode

Hello greetings to all, Here, I have the same questions? The excell of my DEG data have different values such as read count, TPM, FPKM, N of individuals samples in a certain group. Thus which values we will use to analyze the GSEA? Thank you,

ADD REPLY • link 3.5 years ago by yeshiwork2000 • 0

score 1 · Answer 1 · 2021-01-02

1

Entering edit mode

3.9 years ago

rpolicastro 13k

GSEA generally requires a numeric value for all genes, because it relies on the relative rank of genes in a term versus all other genes in the dataset for its calculation. It would be better to return the log2 FC of all genes and not filter by a fold change or adjust p-value threshold. Alternatively you could perform a regular (hypergeometric-like) enrichment analysis with your DEGs and term database of choice.

ADD COMMENT • link 3.9 years ago by rpolicastro 13k

0

Entering edit mode

Thank you for your comment. If I use log2FC of the DEGs without p-value restriction, how can I ensure the significance of the results? There are some high logFC values in my data that also have a very high adj p-value. Also, how do I impute values for those DEGs that absent in some clusters?

ADD REPLY • link 3.9 years ago by fifty_fifty ▴ 70

0

Entering edit mode

You can't ensure the significance of the results, because manipulating the data to any appreciable extent is violating the assumptions of the test. As long as you are not filtering the data by p-value or log FC, don't worry if clusters are missing some genes due to having low or no expression. If you feel your log FC or adjust p-value threshold are critical, you should perhaps switch to an overrepresentation test.

ADD REPLY • link 3.9 years ago by rpolicastro 13k

0

Entering edit mode

thank you, I understand that now. So, I think I should use average expressions of all genes across clusters and treat them as if they were samples in bulk RNA GSEA. If I understand correctly, in bulk GSEA there should be disease and normal samples. But in scRNA-seq cluster marker genes are usually found by comparing the expression of a gene in a cluster of interest vs in all remaining cells. How should I choose 'phenotypes' in this case?

ADD REPLY • link 3.9 years ago by fifty_fifty ▴ 70