Imagine there are two species, A and B. A has 560 genes and B has 101 genes. 66 genes are in common between two species. After doing differential expression analysis we observed that there 18 DE genes for species A and 5 DE genes for species B, and 2 DE genes are in common between species. Our question is whether this overlap of two genes is more than expected by chance or not (does it make sense)?
What I do is to calculate the probability of getting overlap of 2 or more when randomly selecting 18 and 5 elements from two lists with 560 and 101 elements. In R it is :
library(lattice)
simulate_de_rate<-function(n_iter){
result_vector<-NULL
for(n in c(1:1000)){
n_matched<-0
i<-1
while (i<=n_iter){
a<-sample(c(1:560),18)
b<-sample(c(-45:66),5)
if (length(intersect(a,b))>=2){
n_matched<-n_matched+1
}
i<-i+1
}
rate<<-n_matched/n_iter*100
result_vector<-c(result_vector,rate)
}
print(result_vector)
histogram(result_vector)
}
simulate_de_rate(1000)
So the probability of getting this overlap is very low, ~0.3%. Is it valid to say that the result that we see is not by chance? Can you suggest a statistically more rigorous way of calculation?Thanks
You probably want to take a look at the geneOverlap package.
Related, possible duplicate post:
Thanks, more than enough to figure out how to do it correctly :)