Hello,
I have a similar situation described in this post Hypergeometric Test On Gene Set
I have 2 microarrays on 2 different conditions which give me 2 different gene sets of differential expressed transcripts.
Diff in Condition 1: 738
Diff in Condition 2: 1090
Overlap Condition 1 & 2: 453
Total Genes in array: 30941
I want to test the significance of the overlap between the 2 conditions. I use:
phyper(452, 738, 30203, 1090, lower.tail=FALSE)
[1] 0
Any idea why the p-value is 0? I tried based on this post "http://stats.stackexchange.com/questions/16247/calculating-the-probability-of-gene-list-overlap-between-an-rna-seq-and-a-chip-c"
phyper=(overlap,list1,PopSize-list1,list2,lower.tail = FALSE)
Thanks
You should try using log=TRUE
I get:
Any idea what what means? p.value = 1E-1140 ?
e^-1140.21, since log is natural log here.
That number is still 0 when using any calculator. My question is, why is the p-value so low? The overlap is not that great, it is ~50-70% of genes. Is the 2x2 table constructed correctly?
You're calculating the probability of the following scenario:
In a jar where ~ 2% of the balls are white, it would be extraordinarily rare to draw 50-70% of them being white by chance alone, which is why your p-value is so low.
That's why I think p-values in genomics are often meaningless. You get very small p-values even if the effect size is small and this is a consequence of the large of data-sets available (thousands of genes, millions of SNPs etc.). By the way, I wouldn't say ~50-70% is a small overlap...