I have a hard time understanding how the p-value is calculated for each peak in MACS2. Can someone give a detailed example?
I have a hard time understanding how the p-value is calculated for each peak in MACS2. Can someone give a detailed example?
check out the section "Peak detection" here https://hbctraining.github.io/Intro-to-ChIPseq/lessons/05_peak_calling_macs.html
I cannot speak to broad peak detection but for the classic (e.g. MACS 1.4 codebase) basically it estimates the "background noise" using
λlocal = max(λBG, λ1k, λ5k, λ10k)
this tells you e.g. what is the statistical distribution of reads in "whole genome"(BG), 1k (reads in a 1kbp window), 5k (reads in a 5kbp window), and 10k (reads in a 10kb window). it takes the maximum of these which is basically like saying "i want to be robust to the maximum level of noise in these parameters". then, you scan using a sliding window across the whole genome, and you ask, what is the likelihood of seeing a peak with e.g. 1000 reads in your window given this poisson parameter λlocal
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks for your response. I guess my question is, once you calculated λlocal, and you know there are x number of reads at a peak region, what statistical test is used to calculate whether the number of reads is enriched compared to the expected background (what statistical test is used to calculate p-value)?
I don't know how you'd refer to the test (maybe a poisson test or something, i'm not super statistically literate) but that calculation is summarized very nicely by this wikipedia plot, and shows what the poisson distribution is for different parameters of λ, and different values of K (which would be the number of reads) https://en.wikipedia.org/wiki/Poisson_distribution#/media/File:Poisson_pmf.svg
so e.g. if your lambda paramter is 4, and your K is 15, then that is on the far tail of the purple line, giving you a very small P value, and so you would conclude that observing K=15 reads in this window is very unlikely to correspond to the background distribution, so you'd conclude it's a peak (the p-value is the Y-axis). in a sillier example, if you had a normal distribution of peoples heights and you were looking for giants, you'd look at each person in your set of people, compare each person to your normal distribution, and if someone was like 20 feet tall, they'd be like 5+ standard deviations away from everyone else, so you'd say "oh, theres a peak" :)
Thanks. I found out the test is called the exact poisson test https://stats.stackexchange.com/questions/350745/help-me-understand-poisson-test