I have three overlapping sets and I want to find the probability of finding a larger/greater intersection for 'A intersect B intersect C' (in the example below, I want to find the probability of finding more than 135 elements that are common in sets A, B & C). For a two set problem, I guess I would do a Fisher or chi-square test. Here is what I have attempted so far:
### Prepare a 3 way contingency table:
mytable <- array(c(135,116,385,6256,
48,97,274,9555),
dim = c(2,2,2),
dimnames = list(
Is_C = c('Yes','No'),
Is_B = c('Yes','No'),
Is_A = c('Yes','No')))
## test
mantelhaen.test(myrabbit, exact = TRUE, alternative = "greater")
Is this the right test (alongwith the current parameters) to determine what I want or is there a more appropriate test for this?
I was going to suggest you post this also at cross-validated, but then I saw this! Glad biostars are more responsive...
I'm interested to hear what other say as to wether mantelhaen is the right test there. Don't forget if your sets are genomic intervals, the standard methods are less likely to apply due to the non-randomness of the genome. e.g. if all 3 of your datasets are likely to occur in gene-bodies, then that is the relationship, but it will make them appear to be co-occuring if you're considering the entire genome as the background.
Each set consists of a group of genes, and I'm trying to see if the overlap is significant. All the sets are drawn from the full complement of genes across the genome (~17k). Does that answer your question?
Can you tell us if you are looking for genomic overlap?