I would like to conduct an enrichment analysis between two lists using GeneOverlap package in R. I am comparing gene IDs from each list, and there is no NAs in there.
This is the code I'm using:
go.obj <- newGeneOverlap(listA$ID,listB$ID)
go.obj
go.obj <- testGeneOverlap(go.obj)
print(go.obj)#Fisher's exact test
Output
Detailed information about this GeneOverlap object:
listA size=2765, e.g. 56163 60385 79882
listB size=9204, e.g. 9604 55585 56163
Intersection size=2765, e.g. 56163 60385 79882
Union size=9204, e.g. 56163 60385 79882
Genome size=23000
# Contingency Table:
notA inA
notB 13796 0
inB 6439 2765
Overlapping p-value=0e+00
Odds ratio=Inf
Overlap tested using Fisher's exact test(alternative=greater)
Jaccard Index=0.3
I need help to understand the resulting p-value (=0) and odds ratio (=Inf). Is this as a result of something wrong in my input data? or does those results have a meaningful statistical interpretation?
What is that you did not understand? You are testing for the overlap between listA and listB. And your listA is a subset of listB (all the elements of listA are in listB). This means that the overlap between them is highly significant => p-value = 0 (actually it is a very small number rounded to zero; you can check by running your own fisher test in R).
The odds ratio of this test essentially says if the lists are independent (odds ratio = 1). An odds ratio of infinity means that the lists are highly dependent (not independent), as one is contained in other.
does those results have a meaningful statistical interpretation?
oh yeas! your results are extremely significant for both p-value (0) and odds ratio (infinity). I did a fisher test at my end.
> x
[,1][,2][1,] 13796 0
[2,] 6439 2765
> fisher.test(x)
Fisher's Exact Test for Count Data
data: x
p-value < 2.2e-16
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
1604.22 Inf
sample estimates:
odds ratio
Inf
As you see, also the 95% Confidence Interval is extremely high (and doesn't contain 1) showing that the lists are extremely significantly not independent (== dependent)