I have a question that arose from me using the GSreg package in R when analyzing a gene expression dataset. I had a hit on ERBB2 down regulation between sample A and B from reactome but I'm unsure in how to interpret it.
Is there a way to see which of the samples (A or B) are down regulated with this package? Or do I need to look at each gene in the pathway? Or can I only say that this pathway is deregulated between samples?
Correct me if I am wrong, this package takes into account all the genes expressed between your conditions and outlines the most enriched-significant pathway due to the most variable genes between your conditions. It is based on a regression model and by and large it takes in to account Differential Rank Conservation (DIRAC) and tries to build the pathway on those genes. If you find pathway hit for those genes, ideally these genes are enriched in your dataset and significantly changing when compared to all the genes in that pathway. So it is significantly showing the pathway. This should be also indicating that the genes that are involved in that pathway from your samples are differentially expressed. I would prefer to see the direction of this genes in order to understand which condition has it up-regulated. Just extract the genes of the pathway, map them with your expression matrix to see which are these in your samples. Make a heatmap of them and you will get the idea in which conditions they are up or down regulated. Pretty sure ERBB2 will also come out of it.
Alternatively do a differentially expression analysis , if its a microarray, I would here go for limma or even for that matter RankProd since the GSREG is based upon differential ranking. Overlap your DE genes with the genes of the pathway and plot the fold-changes for the ones that are matching. A simple barplot of them will show how these genes are behaving across both conditions. ERBB2 signalling pathway should be the one you target (so overlap all genes from that pathway with your DEGs) and plot that are common with fold changes between your conditions. I hope I made myself clear.
However as I understand GSREG it only looks at gene sets/pathways that you supply and the enrichment is therefore not a part of the package although I could have misinterpreted that part. I've done most of what you've described so I really wnt to understand the GSA part. I've used SAM to identify differentially expressed genes and then analyzed those via GO enrichment tools. Now I want to look at all the genes again however with predefined gene sets.
This should be fairly simple. When you perform GSREG are you using your DEGs or all genes? Lets break it up the thing you did so that I can better advise you. Ideally when you perform a GSEA analysis with either GSREG or broad GSEA tool, the hypothesis that one wants to see is that for a specific pathway your list of input genes will be enriched for that pathway that entails lets say X number of genes. However, you have only Y genes in your DEGs. If know you select that pathway and throw your Y genes, the tool tries to see if your Y genes are still strong enough to bring out an enrichment of that pathway or not. This means that the genes that you have strongly represent that pathway as it overlaps with most of the genes that belong to that pathway. The enrichment is performed with Fischer exact test between your genes of Y against all the genes of that pathway.
So what you can do is, take your DEGs, perform overall GO on all them to see the biological processes that gets enriched.
Also perform separately with up and down degs. You will understand the specific processes that gets enriched with direction based DEGs.
Now lets say you want to perform GSEA for a specific pathway and wants to see how many genes are enriched from your DEGs, you need to select that pathway and perform enrichment of your DEGs (only those genes that are in your list for this biological pathway enrichment). Throw them in any GSEA tool and you will able to see if the pathway is more due to up-regulated genes or down regulated if the enrichment is significant. An enrichment for a geneset refers to the fact that a specific pathway is still signficantly seen since your input gene sets have an enrichment for it and that they still are strong enough to represent that pathway even if they are not all the number of genes that contributes to that pathway.
When you say pre-define gene sets, are they from your DEGs or they are from some other features associated with them? You can anyway try to take a pathway of your choice that come as a result of your GO analysis from DEGs, throw them in GSEA tool along with your pre-defined gene set to see if there is at all any enrichment or not. If there is then your pre-defined gene sets are string strong enough to trigger the same biological pathway, this is done by Ficher enrichment where your pre-defined list is enriched among your the genes that consitutes are specific pathway.
I've run the SAM on the total set (10.000 genes), then gotten 500 DE genes out. Then performed GO enrichment analysis which revealed ERBB2 associated genes as enriched compared to the input total panel of 10.000 genes. Then I ran GSREG on all my 10.000 genes and ran against the reactome pathway list and got that ERBB2 down regulation pathway was significant across my two groups. And then I would like to interpret this and it seems like I need to look at the genes present in the pathway across the samples individually?
So you did GO analysis on 500 DEGs as input while your background
was 10k genes. This is perfectly fine.
Now you did GSREG on all 10k genes and used REACTOME as the pathway
finder and got ERBB2 pathway as enriched and down-regulated between
your conditions. This is also fine.
Now if you want to do an enrichment test here for ERBB2 , then take
the genes from 10k gene list that all showed enrichment of ERBB2
pathway.
Take your DEGs/ or your pre-define gene list and overlap them with
those genes coming from 3. If you see there is an overlap then check
if that overlap is significant or not. Plot the foldchange for these
overlapped genes to understand how many are up or down between your
conditions. This is one way or gene-set enrichment. If you want to
now use a standard tool look below.
Another way which is gene set enrichment analysis is with GSEA tool
from broad (this I know there might be others as well). Take the
genes (X) from your 10k gene list that gives ERBB2 signaling. So you
know that these genes show for ERBB2 signaling. Now take your
pre-defined set of genes or DEGs lets say Y (whichever is the
source of this list) and perform enrichment of them in GSEA tool or
GSREG to see if they still fall in ERBB2 pathway or not and if so
which among this show that the pathway is dominated by
down-regulation in one of the condition.
Hi, and thanks for the answer! Much appreciated!
However as I understand GSREG it only looks at gene sets/pathways that you supply and the enrichment is therefore not a part of the package although I could have misinterpreted that part. I've done most of what you've described so I really wnt to understand the GSA part. I've used SAM to identify differentially expressed genes and then analyzed those via GO enrichment tools. Now I want to look at all the genes again however with predefined gene sets.
This should be fairly simple. When you perform GSREG are you using your DEGs or all genes? Lets break it up the thing you did so that I can better advise you. Ideally when you perform a GSEA analysis with either GSREG or broad GSEA tool, the hypothesis that one wants to see is that for a specific pathway your list of input genes will be enriched for that pathway that entails lets say X number of genes. However, you have only Y genes in your DEGs. If know you select that pathway and throw your Y genes, the tool tries to see if your Y genes are still strong enough to bring out an enrichment of that pathway or not. This means that the genes that you have strongly represent that pathway as it overlaps with most of the genes that belong to that pathway. The enrichment is performed with Fischer exact test between your genes of Y against all the genes of that pathway.
So what you can do is, take your DEGs, perform overall GO on all them to see the biological processes that gets enriched. Also perform separately with up and down degs. You will understand the specific processes that gets enriched with direction based DEGs.
Now lets say you want to perform GSEA for a specific pathway and wants to see how many genes are enriched from your DEGs, you need to select that pathway and perform enrichment of your DEGs (only those genes that are in your list for this biological pathway enrichment). Throw them in any GSEA tool and you will able to see if the pathway is more due to up-regulated genes or down regulated if the enrichment is significant. An enrichment for a geneset refers to the fact that a specific pathway is still signficantly seen since your input gene sets have an enrichment for it and that they still are strong enough to represent that pathway even if they are not all the number of genes that contributes to that pathway.
When you say pre-define gene sets, are they from your DEGs or they are from some other features associated with them? You can anyway try to take a pathway of your choice that come as a result of your GO analysis from DEGs, throw them in GSEA tool along with your pre-defined gene set to see if there is at all any enrichment or not. If there is then your pre-defined gene sets are string strong enough to trigger the same biological pathway, this is done by Ficher enrichment where your pre-defined list is enriched among your the genes that consitutes are specific pathway.
Is it clear now?
Thank you once again!
I've run the SAM on the total set (10.000 genes), then gotten 500 DE genes out. Then performed GO enrichment analysis which revealed ERBB2 associated genes as enriched compared to the input total panel of 10.000 genes. Then I ran GSREG on all my 10.000 genes and ran against the reactome pathway list and got that ERBB2 down regulation pathway was significant across my two groups. And then I would like to interpret this and it seems like I need to look at the genes present in the pathway across the samples individually?
Have I made myself clear now?