Hello,
I have an enrichment analysis I am trying to work out and it gives an unexpected result and I was wondering if I am doing it correct.
I have found KEGG categories for all genes (Ref. Set) in the genome of my organism, when I add up the number of genes in each category, the total is 3426.
I then have chosen a subset of these genes (Test Set) based on a criterion of my choice, and would like to see if these are enriched for a specific pathway/function. Finding KEGG categories for these results in a total of 98 categories.
I then notice that the biggest category in my "Test Set" is "Ribosome", with 12 genes (out of 98) being part of that category. When I look at my "Ref. Set", I see that overall in the genome 87 (out of 3426) are part of that category.
Just looking at ratios:
12/98 = 0.1224
87/3426 = 0.02539
It looks like there is more of that category in the test set compared to reference set. Now to the Fisher's Test:
> fisher.test(matrix(c(12, 75, 86, 3253), 2, 2), alternative='less')$p.value
[1] 0.9999993
It looks like the p-value is very high, indicating no enrichment whatsoever. What am I missing here? Thank you.
I second the idea of a simple sanity check. It's easy enough to not match the matrix orientation that fisher.test() is expecting that you're pretty likely to muck things up otherwise.