Question

Questions regarding GSEA - clusterProfiler

0

Entering edit mode

3 months ago

CaroH ▴ 10

Good morning everyone,

I recently received some bulk RNA sequencing data : 3 controls and 3 treated (coming from mice). I identified the DEGs using the DESeq package. Out of the >17000 genes detected, approximately 150 genes had a pvalue which was inferior to 0.05. If I look at the number of genes with a padj value inferior to 0.05, it's even less than 5 genes.

However, I am fairly new with these analysis, and even if the DEG analysis seem to indicate that not a lot is happening in the presence of my treatment, I ran clusterProfiler. To perform GSEA, I use the entire gene list (>17000 genes) ranked based on the log2FoldChange. I see that I have several KEGG showing enrichment. For example, tyrosine metabolism is showing a NES of 2 with a pvalue at 3,00542E-05 and a padj at 3,00542E-05. Using gseKEGG gave me 54 results.

I was wondering if there was a way to incorporate the results of the pvalue ? If it was necessary to perform GSEA ? For example, should I only use the 150 genes with a pvalue inferior to 0.05 - but I read several posts saying that it was reducing the power of the analysis...

What am I doing wrong ? What would you advice ?

Thanks !

RNA-seq GSEA • 726 views

ADD COMMENT • link updated 3 months ago by mark.ziemann ★ 2.0k • written 3 months ago by CaroH ▴ 10

score 1 · Answer 1 · 2025-01-15

1

Entering edit mode

3 months ago

mark.ziemann ★ 2.0k

One of the best ways to run GSEA is by using the test statistic from DESeq2 as the gene scoring metric. It takes into consideration the statistical strength of differential expression as well as the fold change. It will probably be less noisy as compared to the log2foldchange.

Regarding ORA enrichment with clusterprofiler, there really is no rule that you need to select the ones with FDR<0.05. You could easily set the threshold to FDR<0.1 or p<0.01, or take the top 1000 up and downregulated genes for enrichment. A similar approach has been described previously (here). Using less than 200 genes is not recommended for ORA as the sample is so small you are likely to have a low sensitivity.

ADD COMMENT • link 3 months ago by mark.ziemann ★ 2.0k

0

Entering edit mode

Hello. Thank you for your answer. What do you mean by the test statistic from DESeq2 as the gene scoring metric ?

ADD REPLY • link 3 months ago by CaroH ▴ 10

0

Entering edit mode

The output DE table from DESeq has the following column headers: "baseMean", "log2FoldChange", "lfcSE", "stat", "pvalue" and "padj". The test statistic is the "stat" column

ADD REPLY • link 3 months ago by mark.ziemann ★ 2.0k