Questions regarding GSEA - clusterProfiler
1
0
Entering edit mode
11 weeks ago
CaroH ▴ 10

Good morning everyone,

I recently received some bulk RNA sequencing data : 3 controls and 3 treated (coming from mice). I identified the DEGs using the DESeq package. Out of the >17000 genes detected, approximately 150 genes had a pvalue which was inferior to 0.05. If I look at the number of genes with a padj value inferior to 0.05, it's even less than 5 genes.

However, I am fairly new with these analysis, and even if the DEG analysis seem to indicate that not a lot is happening in the presence of my treatment, I ran clusterProfiler. To perform GSEA, I use the entire gene list (>17000 genes) ranked based on the log2FoldChange. I see that I have several KEGG showing enrichment. For example, tyrosine metabolism is showing a NES of 2 with a pvalue at 3,00542E-05 and a padj at 3,00542E-05. Using gseKEGG gave me 54 results.

I was wondering if there was a way to incorporate the results of the pvalue ? If it was necessary to perform GSEA ? For example, should I only use the 150 genes with a pvalue inferior to 0.05 - but I read several posts saying that it was reducing the power of the analysis...

What am I doing wrong ? What would you advice ?

Thanks !

RNA-seq GSEA • 624 views
ADD COMMENT
1
Entering edit mode
11 weeks ago
mark.ziemann ★ 2.0k

One of the best ways to run GSEA is by using the test statistic from DESeq2 as the gene scoring metric. It takes into consideration the statistical strength of differential expression as well as the fold change. It will probably be less noisy as compared to the log2foldchange.

Regarding ORA enrichment with clusterprofiler, there really is no rule that you need to select the ones with FDR<0.05. You could easily set the threshold to FDR<0.1 or p<0.01, or take the top 1000 up and downregulated genes for enrichment. A similar approach has been described previously (here). Using less than 200 genes is not recommended for ORA as the sample is so small you are likely to have a low sensitivity.

ADD COMMENT
0
Entering edit mode

Hello. Thank you for your answer. What do you mean by the test statistic from DESeq2 as the gene scoring metric ?

ADD REPLY
0
Entering edit mode

The output DE table from DESeq has the following column headers: "baseMean", "log2FoldChange", "lfcSE", "stat", "pvalue" and "padj". The test statistic is the "stat" column

ADD REPLY

Login before adding your answer.

Traffic: 2522 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6