Gene set enrichment analysis using a curated gene list and cluster DE genes
1
0
Entering edit mode
5.3 years ago
asmariyaz23 ▴ 10

I have a curated gene list using which I would like to carry out enrichment analysis on DE genes in clusters obtained using Seurat. I first tried to do this manually using Fisher Exact test like so:

No. genes in curated list: 5840 

No. DE genes in Cluster 0 (from Seurat): 512

No. Overlap genes: 209

No. Universe: 23,000

No. Untested: 23000 - (5631+209+303) = 16857

.

5840-209=5631
512-209=303

2X2 contingency table is designed as such:

209 5631
303 16857

The odds ratio looks off in this case so I am wondering if I designed the test correctly?

Secondly, I was trying to find a package (like fsgea) in R that would let me do this kind of analysis. My idea was to use all DE genes in each cluster to be fed as a custom pathway. But I am confused about the ranked list? What should that be? Unable to figure out where the curated gene list fit into the equation. Alternatively, is there a better approach to address this issue?

RNA-Seq enrichment R • 2.1k views
ADD COMMENT
0
Entering edit mode

I will try it this way as well, just needed clarification on 2 variables N and k.

N = Are these the total number of genes in matrix (after initial filtration in a single cell package, in my case Seurat)?

k = Here do you refer to only the DE expressed genes in the cluster of interest or the total number of genes in the cluster?

Thank you again for your insight on this.

ADD REPLY
0
Entering edit mode

The odds ratio looks off in this case

Why do you think this ?

ADD REPLY
1
Entering edit mode
5.3 years ago

I think you're going about it the wrong way. If you want to know the probability of having the observed number or more curated genes in a cluster of DE genes, you can cast this as an urn problem. In the urn, you have N genes where N is the number of genes tested for differential expression, of these N genes, m are marked as curated and you draw k genes (the number of genes in the cluster of interest) out of which q are curated. So the probability of getting q or more curated genes in the cluster just by chance is given (in R) by phyper(q-1, m, N-m, k, lower.tail=FALSE)

ADD COMMENT
0
Entering edit mode

The hypergeometric test (urn problem) is equivalent to the corresponding one-tailed version of Fisher's exact test. It is just a different way to think about the data, as it provides the same pvalue. See with the OP's data:

> fisher.test(matrix(c(209,5631,303,16857),2,2), alternative="g")$p.value
[1] 8.277633e-15
> phyper(209-1,303+209,16857+5631,209+5631, lower.tail=FALSE)
[1] 8.277633e-15
ADD REPLY
1
Entering edit mode

I know. I was trying to clarify things for the OP which seemed confused by the GSEA approach.

ADD REPLY

Login before adding your answer.

Traffic: 2005 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6