Good morning everyone,
I recently received some bulk RNA sequencing data : 3 controls and 3 treated (coming from mice). I identified the DEGs using the DESeq package. Out of the >17000 genes detected, approximately 150 genes had a pvalue which was inferior to 0.05. If I look at the number of genes with a padj value inferior to 0.05, it's even less than 5 genes.
However, I am fairly new with these analysis, and even if the DEG analysis seem to indicate that not a lot is happening in the presence of my treatment, I ran clusterProfiler. To perform GSEA, I use the entire gene list (>17000 genes) ranked based on the log2FoldChange. I see that I have several KEGG showing enrichment. For example, tyrosine metabolism is showing a NES of 2 with a pvalue at 3,00542E-05 and a padj at 3,00542E-05. Using gseKEGG gave me 54 results.
I was wondering if there was a way to incorporate the results of the pvalue ? If it was necessary to perform GSEA ? For example, should I only use the 150 genes with a pvalue inferior to 0.05 - but I read several posts saying that it was reducing the power of the analysis...
What am I doing wrong ? What would you advice ?
Thanks !
Hello. Thank you for your answer. What do you mean by the test statistic from DESeq2 as the gene scoring metric ?
The output DE table from DESeq has the following column headers: "baseMean", "log2FoldChange", "lfcSE", "stat", "pvalue" and "padj". The test statistic is the "stat" column