Question

GSEA - Calculate the probability of each gene being enriched?

1

Entering edit mode

6.8 years ago

adnbps ▴ 10

I'm interested in constructing a gene module of genes that are co-expressed with gene A. To define this module, I calculate the correlation of gene A with all other genes, producing a vector of correlation coefficients:

R = {ra, rb, rc, ...}

I think this module should be enriched in a particular pathway, which has the following set of genes:

G = {g1, g2, ..., gn}

When I test this hypothesis with GSEA, it is very significant. However, there are too many "leading edge" genes. What I would like to do is produce a list of genes that are statistically enriched in this module, above a certain level of confidence.

Is there a way to do this?

GSEA RNA-Seq Statistics gene module • 1.5k views

ADD COMMENT • link updated 5.7 years ago by Biostar 20 • written 6.8 years ago by adnbps ▴ 10

score 1 · Answer 1 · 2018-06-27

If I understand correctly: your rank metric is R, where your top-ranked gene is your gene of interest "A" itself. Then using GSEA you are assessing the overlap of your gene against a module G. In this case, to define "significance of enrichment of a (sub)set of genes in the module" is probably going to be tough statistically (I can not think of any good hypothesis for testing!), because your ranking comes from correlation with gene A but your intention is statistical testing for a gene set overlap with G. Have you thought about using the statistical significances of overlaps between gene A and other genes, draw a cut-off on that significance to get your "list of genes". For example your final list could be those that are "leading-edge" + "p-adj(correlation coef) <= 0.05". In R cor.test gives you both t statistic and the p-value. Alternatively, you can generate a set of genes (say B) with significant co-expression (p-adj(correlation coef) <= 0.05) and do a simple Fisher's exact test for the overlap between B and G.