Hi All,
I am working on the RNAseq samples. Planning to do pathway analysis and gene enrichment analysis. As of now don't have much background on these analyses. Currently doing some background research. If you people know some useful resources, kindly do share with me.
- At first instance, why do we do pathway analysis and gene enrichment analysis?
- Have a set of genes which are upregulated and down regulated between wild type and mutant, how to get enrichment score for upregulated genes and down regulated genes?
- How to identify which pathways are enriched in the wild type and mutant samples?
- How to identify which pathways are enriched in upregulated or downregulated genes?
Adding to the list,
Command-line based: Gene Set Clustering based on Functional annotation (GeneSCF)
I'm also going to recommend my very recent answer to a similar question for why we do enrichment analyses and how they work.
Other resources include clusterProfiler (R) and enrichR (web-based and R).
Good answer on the other thread, jared - had not seen it. Thanks!
Hi Kevin, Sample1 - Mutant, Sample2 -Wildtype. As per the list given to me there are 680 genes in that cuffdiff output file. Just for understanding, when I took log2(Value_2/Value_1) -> Wildtype/Mutant, I got the same logFC as per the cuffdiff output. As you mentioned, I categorized the genes based on the log fold change now.
Should I run GSEA separately on upregulated gene list and downregulated gene list or on total gene list?
I would likely run all three lists, as you can make different statements about each. For the full list, you can say that enriched pathways are perturbed or deregulated. Maybe the genes are split between up/down regulated. It still provides you something to hypothesize about, though actual effects would have to be measured more directly.
The up/down lists yield more direct observations. For instance, maybe many genes involved in calcium signaling are upregulated in the mutant, which might allow you to speculate something about the mutant phenotype. Perhaps something that could be easily experimentally validated.
Either way, running an additional list is easy, so there's no reason not to do all 3 sets.
Thanks, Jared. I have done GSEA on all three. But I was not sure which one is more meaningfull in interpreting.
For instance, when I did GSEA on upregulated gene list (570 genes). I selected this GENESET DATABASE "Mouse_GOBP_AllPathways_no_GO_iea_October_01_2018_symbol.gmt". GSEA finished successfully. As per the GSEA report for upregulated gene list, I could see
What is na_pos and na_neg? Is it mutant and wild type? How to know which is mutant and wild type?
How to interpret these values?