Hi!
This is not the first time this question is being asked, but I am confused from the previous post.
I have say two lists. List1 has 598 genes and List2 has 5500 genes and the total genes available in the pool from which these two are drawn is of size 23000 (say).
Now, if I have to make compute whether the overlap between the two list which is of 89 genes is significant or not.
I have two formulas:
method 1
phyper=(overlap-1,list1,PopSize-list1,list2,lower.tail = FALSE, log.p = FALSE)
phyper=(88,598,23000-598,5500,lower.tail = FALSE, log.p = FALSE)
method 2
phyper=(overlap,list1,PopSize,list2,lower.tail = FALSE, log.p = FALSE)
phyper=(89,598,23000,5500,lower.tail = FALSE, log.p = FALSE)
Now which method shall I use and why?
I am really confused.
Thank you
This thread should help you: http://stats.stackexchange.com/questions/16247/calculating-the-probability-of-gene-list-overlap-between-an-rna-seq-and-a-chip-c
and remember that the p-value is the probability of obtaining a result at least as extreme as the one that was randomly observed