Fisher's exact test gives p-value 0
0
0
Entering edit mode
8.5 years ago
Adrian Pelin ★ 2.6k

Hello,

I have a similar situation described in this post Hypergeometric Test On Gene Set

I have 2 microarrays on 2 different conditions which give me 2 different gene sets of differential expressed transcripts.

Diff in Condition 1: 738

Diff in Condition 2: 1090

Overlap Condition 1 & 2: 453

Total Genes in array: 30941

I want to test the significance of the overlap between the 2 conditions. I use:

phyper(452, 738, 30203, 1090, lower.tail=FALSE)

[1] 0

Any idea why the p-value is 0? I tried based on this post "http://stats.stackexchange.com/questions/16247/calculating-the-probability-of-gene-list-overlap-between-an-rna-seq-and-a-chip-c"

phyper=(overlap,list1,PopSize-list1,list2,lower.tail = FALSE)

Thanks

enrichment R fisher's exact test • 6.8k views
ADD COMMENT
0
Entering edit mode

You should try using log=TRUE

ADD REPLY
0
Entering edit mode

I get:

phyper(452, 738, 30203, 1090, lower.tail=FALSE, log.p = TRUE) [1] -1140.21

Any idea what what means? p.value = 1E-1140 ?

ADD REPLY
0
Entering edit mode

e^-1140.21, since log is natural log here.

ADD REPLY
0
Entering edit mode

That number is still 0 when using any calculator. My question is, why is the p-value so low? The overlap is not that great, it is ~50-70% of genes. Is the 2x2 table constructed correctly?

ADD REPLY
5
Entering edit mode

You're calculating the probability of the following scenario:

  • You have a jar of 30203 black balls and 738 white balls
  • You draw 1090 of them randomly without replacement
  • You count the number of white balls you have drawn and it is equal to 452
  • The probability of drawing greater than 452 white balls given your conditions is virtually zero
  • Inversely, the probability of drawing fewer than 452 white balls given your conditions is virtually one

In a jar where ~ 2% of the balls are white, it would be extraordinarily rare to draw 50-70% of them being white by chance alone, which is why your p-value is so low.

ADD REPLY
1
Entering edit mode

The overlap is not that great, it is ~50-70% of genes

That's why I think p-values in genomics are often meaningless. You get very small p-values even if the effect size is small and this is a consequence of the large of data-sets available (thousands of genes, millions of SNPs etc.). By the way, I wouldn't say ~50-70% is a small overlap...

ADD REPLY

Login before adding your answer.

Traffic: 2598 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6