Hi all,
I have been working on a project where I have been using a Limma pipeline to find DEGs using transcriptomic data. Once I have my list of DEGs with our set threshold, we want to perform some sort of pathway analysis (ORA using ClusterProfiler) that is directional, labeling pathways and suppressed or activated. I then put that list of genes into the gseGO and gseKEGG functions, and produce dot plots that show activated and suppressed pathways.
From what I now understand:
You should be using the entire counts matrix for GSEA, not a subset of the Genes (i.e. DEGs).
- It can be a pre-ranked list, or you can do more using the GSEA package and enter some sort of normalized counts (I can't exactly remember exactly what function of DESeq2 to use).
- For a pathway analysis of your list of DEGs, you should be some sort of over representation analysis, like enrichKEGG or enrichGO.
So I have the following question: What method could I use that is a form of directional Pathway analysis? Is the way that I am going about things wrong, and should I change my approach? Is there even a way to do this?
I am thankful for any contributions or feedback.
edits: adding clarification about what I mean by directional pathway analysis, correct capitalization of ClusterProfiler
While Gordon's answer is the technically correct one from a statistical point of view, please keep in mind that genes in a pathway being up regulated does not mean that the pathway is activated, nor does genes in a pathway being downregulated mean the pathway is repressed. In fact, its possible for their to be an enrichment of both upregulated and downregulated genes in a pathway, and this does not mean that the activity of the pathway is overall unchanged.
Yes, I had already noted in my answer that the GO and KEGG annotations are not directional, meaning that genes in a GO term or KEGG pathway may go up or down when the relevant biological process is active. There are other databases that are more directional and, in particular, the MSigDB generally provides directional gene sets because the sets are obtained from DE experiments.
Moreover, the limma functions
roast
andfry
will accept a vector of logFCs with each gene set that define the direction and/or magnitude of change of each gene when the pathway is active, so that the user can in fact undertake truly directional tests.GSEA is directional and will identify gene sets that are regulated in one direction. If you want to use ORA, you could separate up-regulated and down-regulated genes and check whether any of them are associated with particular GOs.
Ah thank you, this is pretty simple, I'm not sure why I didn't think of this. Thank you, I appreciate it!