I have a curated gene list using which I would like to carry out enrichment analysis on DE genes in clusters obtained using Seurat. I first tried to do this manually using Fisher Exact test like so:
No. genes in curated list: 5840
No. DE genes in Cluster 0 (from Seurat): 512
No. Overlap genes: 209
No. Universe: 23,000
No. Untested: 23000 - (5631+209+303) = 16857
.
5840-209=5631
512-209=303
2X2 contingency table is designed as such:
209 5631
303 16857
The odds ratio looks off in this case so I am wondering if I designed the test correctly?
Secondly, I was trying to find a package (like fsgea) in R that would let me do this kind of analysis. My idea was to use all DE genes in each cluster to be fed as a custom pathway. But I am confused about the ranked list? What should that be? Unable to figure out where the curated gene list fit into the equation. Alternatively, is there a better approach to address this issue?
I will try it this way as well, just needed clarification on 2 variables N and k.
N = Are these the total number of genes in matrix (after initial filtration in a single cell package, in my case Seurat)?
k = Here do you refer to only the DE expressed genes in the cluster of interest or the total number of genes in the cluster?
Thank you again for your insight on this.
Why do you think this ?