Question

Calculating p-Value for an observed/expected Ratio

1

Entering edit mode

7.6 years ago

chrys ▴ 80

Hi,

I am performing a shuffle of reads from a ChIP-Seq experiment over some features in the genome and since I do not know the original distributions I am trying to perform an observed to expected calculation to see which features are enriched. Boiled down this means I have counts of reads mapping to feature x, then I perform a shuffle of the reads over the entire genome, count those and then calculate the count of how many reads mapped to feature x by chance. Unfortunately, I am not quite sure on how to get a p-value from this.

I hope you can help me with this. Thank you very much!

ChIP-Seq Reads • 4.1k views

ADD COMMENT • link updated 7.6 years ago by Ben ▴ 60 • written 7.6 years ago by chrys ▴ 80

2

Entering edit mode

7.6 years ago

Protostome ▴ 50

Try the fisher exact test. The contingency table could be something

	By random --	In your experiment
# Reads mapping to Feature X	_	_
# Reads not mapping to X	_	_

ADD COMMENT • link 7.6 years ago by Protostome ▴ 50

0

Entering edit mode

I thought about using the fishers exact test, but wouldn't I have to calculate the test for every shuffle and then somehow correct the cumulative p-values somehow by say Bonferroni correction ?

ADD REPLY • link 7.6 years ago by chrys ▴ 80

0

Entering edit mode

Yes, that makes sense

ADD REPLY • link 7.6 years ago by Protostome ▴ 50

2

Entering edit mode

7.6 years ago

Ben ▴ 60

you should use poisson distribution to calculate the p value of binding sites from ChIP-seq data

ADD COMMENT • link 7.6 years ago by Ben ▴ 60

score 4 · Accepted Answer · 2017-05-23

4

Entering edit mode

7.6 years ago

Jean-Karim Heriche 27k

This looks like a permutation test. Your p-value would be the number of times the permutations give more reads mapping to the feature under consideration divided by the number of permutations. And if you're doing multiple tests (i.e. testing many features), you should do a correction for multiple testing. Bonferroni is fine but too conservative, you should prefer FDR.

ADD COMMENT • link 7.6 years ago by Jean-Karim Heriche 27k

1

Entering edit mode

A small comment on this excellent answer. Do you expect reads to come from entire genome (since you are shuffling over it)? Maybe it is worth thinking to restrict it to the open chromatin regions.

ADD REPLY • link 7.6 years ago by e.rempel ★ 1.1k

1

Entering edit mode

Thanks for the help! Will the permutation test p-value also apply if I process the data ? Say I count the reads and do a log transform of the counts. Or better: Does the formula hold true, when I compare the values after I do some sort of processing with it as long as I do the same processing with both, the observed data and the permuted data ? I noticed that Bonferroni will be rather conservative, thanks for the advice. I will use FDR.

ADD REPLY • link 7.6 years ago by chrys ▴ 80

0

Entering edit mode

You can do the permutation test after you have processed the data.

ADD REPLY • link 7.6 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thank you very kindly! You have been a great help.

ADD REPLY • link 7.6 years ago by chrys ▴ 80