Hi,
I am performing a shuffle of reads from a ChIP-Seq experiment over some features in the genome and since I do not know the original distributions I am trying to perform an observed to expected calculation to see which features are enriched. Boiled down this means I have counts of reads mapping to feature x, then I perform a shuffle of the reads over the entire genome, count those and then calculate the count of how many reads mapped to feature x by chance. Unfortunately, I am not quite sure on how to get a p-value from this.
I hope you can help me with this. Thank you very much!
A small comment on this excellent answer. Do you expect reads to come from entire genome (since you are shuffling over it)? Maybe it is worth thinking to restrict it to the open chromatin regions.
Thanks for the help! Will the permutation test p-value also apply if I process the data ? Say I count the reads and do a log transform of the counts. Or better: Does the formula hold true, when I compare the values after I do some sort of processing with it as long as I do the same processing with both, the observed data and the permuted data ? I noticed that Bonferroni will be rather conservative, thanks for the advice. I will use FDR.
You can do the permutation test after you have processed the data.
Thank you very kindly! You have been a great help.