I have several lists of genes, say A, B, C. I want to test if the overlap between A and B; A and C are significant or not. I am planning on doing a hypergeometric test using R.
Please help me to verify if this is correct or no:
phyper(q, m, n, k, lower.tail = FALSE, log.p = FALSE)
q
: # of overlap between A and B/C - 1
m
: # of genes in A
n
: total # of genes in sample - # of genes in A
k
: # of genes in B/C
Also, what if there is no overlap? Do I still do the same thing but with q=0 or q=-1?
Thanks a lot for the help!
Is there anyway to do this test for mutliple lists. Like lets say that I want to test whether gene lists for seven tests are more similar than expected by chance. I know that I can do 7+6+5...1 pairwise tests. But it would seem more elegant to do a global test for overlap.
What you're asking is unclear. What do you mean by "to test whether gene lists for seven tests are more similar than expected by chance.".
The test is for deciding how likely/unlikely the overlap between two sets is, there's no notion of similarity involved. Also consider posting as a new question providing a link to bioinformatics otherwise this is probably a purely statistical question best addressed on Cross Validated.