Question

How to calculate enrichment P-value?

2

Entering edit mode

8.8 years ago

biostart ▴ 370

Hello,

Quick question: How to calculate the P value for the enrichment of my dataset in a certain feature?

I have calculated (using bedtools), that 5% of my dataset "A" intersects with a genomic feature of interest, and I calculated that for a random subset of genomic regions of the same size the intersection would be 11%. Thus, my dataset seems to have a strong depletion of this feature in comparison with genome-average. How do I calculate a P-value for this?

Thanks!

ChIP-Seq • 7.6k views

ADD COMMENT • link updated 2.4 years ago by Ram 44k • written 8.8 years ago by biostart ▴ 370

Ram · Answer 1 · 2016-02-11

2

Entering edit mode

8.8 years ago

Amitm ★ 2.3k

A Fisher's Exact test maybe? I am not sure if that might violate some statistical assumption but a 2x2 contingency table seems the straightforward way to go.

Col 1 -> 5, 95

Col 2 -> 11, 89

You get the picture. You can quickly do an online calc here http://www.quantitativeskills.com/sisa/statistics/fisher.htm

or you could use your favourite software/ R

ADD COMMENT • link updated 2.4 years ago by Ram 44k • written 8.8 years ago by Amitm ★ 2.3k

0

Entering edit mode

I guess, I should also multiply all the values which you listed by the absolute number of reads in the dataset (~10000)? And after I do this, the P value is very small, is it expected for these values?

ADD REPLY • link 8.8 years ago by biostart ▴ 370

0

Entering edit mode

please don't do this. Or if you do, could you do some sampling as well, to show how wildly out your Fisher test p-values are

ADD REPLY • link 8.8 years ago by russhh 5.7k

0

Entering edit mode

hi,

I don't think using the number of reads in the dataset is good idea. Fisher's test's assumptions are that the observations are independent. Whereas the number of reads is a fixed space from where you are sampling overlapping or not-overlapping. So that would not be independent

ADD REPLY • link 8.8 years ago by Amitm ★ 2.3k

1

Entering edit mode

Fisher's exact test works on counts, not on percentages. To compare percentages, you should use the two-proportion z-test.

ADD REPLY • link 8.8 years ago by Jean-Karim Heriche 27k