Question

Pathway analysis and Gene enrichment Analysis Queries

0

Entering edit mode

6.9 years ago

bioinforesearchquestions ▴ 370

Hi All,

I am working on the RNAseq samples. Planning to do pathway analysis and gene enrichment analysis. As of now don't have much background on these analyses. Currently doing some background research. If you people know some useful resources, kindly do share with me.

At first instance, why do we do pathway analysis and gene enrichment analysis?
Have a set of genes which are upregulated and down regulated between wild type and mutant, how to get enrichment score for upregulated genes and down regulated genes?
How to identify which pathways are enriched in the wild type and mutant samples?
How to identify which pathways are enriched in upregulated or downregulated genes?

RNAseq Pathway analysis Enrichment score GSEA • 4.1k views

ADD COMMENT • link updated 6.9 years ago by dz2353 ▴ 120 • written 6.9 years ago by bioinforesearchquestions ▴ 370

score 3 · Answer 1 · 2018-12-04

3

Entering edit mode

6.9 years ago

dz2353 ▴ 120

Hi, Maybe you can try this one: Metascape (web-based). For pathway analysis, I used IPA but is a commercial software.

ADD COMMENT • link 6.9 years ago by dz2353 ▴ 120

score 2 · Answer 2 · 2018-12-04

2

Entering edit mode

6.9 years ago

Kevin Blighe 89k

At first instance, why do we do pathway analysis and gene enrichment analysis?

You could do your own background reading in order to understand this. Try, in a search engine, keywords ncbi enrichment pathway analysis

Have a set of genes which are upregulated and down regulated between wild type and mutant, how to get enrichment score for upregulated genes and down regulated genes?

Perform the enrichment separately, using the direction of fold-change to determine up- and down-regulation

How to identify which pathways are enriched in the wild type and mutant samples?

Different ways to do this. This could be the same as the answer that I gave in the previous point, or you could define a threshold Z-score for 'expressed' 'not expressed' (using the entire unfiltered dataset), and perform the enrichment and / or pathway analysis separately on those genes passing the threshold in wild type and, then, mutant.

How to identify which pathways are enriched in upregulated or downregulated genes?

Perform the pathway analysis separately, using the direction of fold-change to determine up- and down-regulation

-----------------------------

Some resources to get you started:

STRING (web-based)
DAVID (web-based)
topGO (R)
KEGGprofile (R)

Kevin

ADD COMMENT • link 5.3 years ago by Kevin Blighe 89k

2

Entering edit mode

Adding to the list,

Command-line based: Gene Set Clustering based on Functional annotation (GeneSCF)

ADD REPLY • link 6.9 years ago by EagleEye 7.6k

2

Entering edit mode

I'm also going to recommend my very recent answer to a similar question for why we do enrichment analyses and how they work.

Other resources include clusterProfiler (R) and enrichR (web-based and R).

ADD REPLY • link 6.9 years ago by jared.andrews07 ★ 19k

0

Entering edit mode

Good answer on the other thread, jared - had not seen it. Thanks!

ADD REPLY • link 6.9 years ago by Kevin Blighe 89k

0

Entering edit mode

Hi Kevin, Sample1 - Mutant, Sample2 -Wildtype. As per the list given to me there are 680 genes in that cuffdiff output file. Just for understanding, when I took log2(Value_2/Value_1) -> Wildtype/Mutant, I got the same logFC as per the cuffdiff output. As you mentioned, I categorized the genes based on the log fold change now.

For 110 genes, the logFC values are positive and ranged between 1.02 to 4.8. So these genes are downregulated for mutant sample.
For 570 genes, the logFC values are negative and ranged between -9 to -1. So these genes are upregulated for mutant sample. Is my understanding correct?

I am planning to use GSEA. I have prepared three ranked gene list files (sorted logFC descending)

1) with 680 genes and their logFC values

2) with 570 genes and their logFC values for upregulated

3) with 110 genes and their logFC values for downregulated

Should I run GSEA separately on upregulated gene list and downregulated gene list or on total gene list?

ADD REPLY • link 6.9 years ago by bioinforesearchquestions ▴ 370

1

Entering edit mode

I would likely run all three lists, as you can make different statements about each. For the full list, you can say that enriched pathways are perturbed or deregulated. Maybe the genes are split between up/down regulated. It still provides you something to hypothesize about, though actual effects would have to be measured more directly.

The up/down lists yield more direct observations. For instance, maybe many genes involved in calcium signaling are upregulated in the mutant, which might allow you to speculate something about the mutant phenotype. Perhaps something that could be easily experimentally validated.

Either way, running an additional list is easy, so there's no reason not to do all 3 sets.

ADD REPLY • link 6.9 years ago by jared.andrews07 ★ 19k

0

Entering edit mode

Thanks, Jared. I have done GSEA on all three. But I was not sure which one is more meaningfull in interpreting.

For instance, when I did GSEA on upregulated gene list (570 genes). I selected this GENESET DATABASE "Mouse_GOBP_AllPathways_no_GO_iea_October_01_2018_symbol.gmt". GSEA finished successfully. As per the GSEA report for upregulated gene list, I could see

100/648 gene sets are upregulated in phenotype na_pos

42 gene sets are significant at FDR < 25%

32 gene sets are significantly enriched at nominal pvalue < 1%

548/648 gene sets are upregulated in phenotype na_neg

35 gene sets are significantly enriched at FDR < 25%

24 gene sets are significantly enriched at nominal pvalue < 1%

What is na_pos and na_neg? Is it mutant and wild type? How to know which is mutant and wild type?

How to interpret these values?

ADD REPLY • link 6.9 years ago by bioinforesearchquestions ▴ 370