Calculating p-Value for an observed/expected Ratio
3
1
Entering edit mode
7.5 years ago
chrys ▴ 80

Hi,

I am performing a shuffle of reads from a ChIP-Seq experiment over some features in the genome and since I do not know the original distributions I am trying to perform an observed to expected calculation to see which features are enriched. Boiled down this means I have counts of reads mapping to feature x, then I perform a shuffle of the reads over the entire genome, count those and then calculate the count of how many reads mapped to feature x by chance. Unfortunately, I am not quite sure on how to get a p-value from this.

I hope you can help me with this. Thank you very much!

ChIP-Seq Reads • 4.0k views
ADD COMMENT
4
Entering edit mode
7.5 years ago

This looks like a permutation test. Your p-value would be the number of times the permutations give more reads mapping to the feature under consideration divided by the number of permutations. And if you're doing multiple tests (i.e. testing many features), you should do a correction for multiple testing. Bonferroni is fine but too conservative, you should prefer FDR.

ADD COMMENT
1
Entering edit mode

A small comment on this excellent answer. Do you expect reads to come from entire genome (since you are shuffling over it)? Maybe it is worth thinking to restrict it to the open chromatin regions.

ADD REPLY
1
Entering edit mode

Thanks for the help! Will the permutation test p-value also apply if I process the data ? Say I count the reads and do a log transform of the counts. Or better: Does the formula hold true, when I compare the values after I do some sort of processing with it as long as I do the same processing with both, the observed data and the permuted data ? I noticed that Bonferroni will be rather conservative, thanks for the advice. I will use FDR.

ADD REPLY
0
Entering edit mode

You can do the permutation test after you have processed the data.

ADD REPLY
0
Entering edit mode

Thank you very kindly! You have been a great help.

ADD REPLY
2
Entering edit mode
7.5 years ago
Protostome ▴ 50

Try the fisher exact test. The contingency table could be something

By random -- In your experiment
# Reads mapping to Feature X _ _
# Reads not mapping to X _ _
ADD COMMENT
0
Entering edit mode

I thought about using the fishers exact test, but wouldn't I have to calculate the test for every shuffle and then somehow correct the cumulative p-values somehow by say Bonferroni correction ?

ADD REPLY
0
Entering edit mode

Yes, that makes sense

ADD REPLY
2
Entering edit mode
7.5 years ago
Ben ▴ 60

you should use poisson distribution to calculate the p value of binding sites from ChIP-seq data

ADD COMMENT

Login before adding your answer.

Traffic: 1664 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6