I guess you folks have lot of experience in running MACS to find peaks/binding sites of any transcription factor. Today I ran MACS recent version (v 1.4.2) on my Sox2 TF using a input/control. I found 8000 peaks. But when I count the number of input reads (6th column) and sox2 reads (7th col) that overlap sox2 peak using coverageBed, I found something weird. The difference between the number of reads some times is just 1 or 2. How can it be a peak if the difference between the reads is just one ?
You should read the MACS paper, and there's even a recent Nature Protocols article on using MACS. MACS calls peaks by comparing the read count in a window of 2 x bandwidth, to the expected read count in that window as estimated from a control. But this background estimatation depends on looking at three local universes of 1k, 5k, and 10k bp, and choosing the one that leads to the highest background estimate (if I remember correctly). Your interval above is only 255. I think the default bandwidth for MACS is 300 bases. So to understand the MACS call, you'd be better off looking at how many reads there are in a 600 base window for your IP (if you used the defaults), and at how many reads you'd expect given larger windows from your input sample.
You can also get a feel for what peaks to ignore given the p-values returned. If you look at the distribution of p-values returned by MACS (in the form of -log10(p-value)), I bet you'll see that your example above with a value of 50.77, is pretty low on the totem pole, and well within the range of things to be ignored.
I may have misunderstood this answer, but as far as i am aware "band width" no longer has any effect on peak calling, only in the estimation of "d" during the model building step.
"--bw The band width which is used to scan the genome for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building." - Manual 1.4.2
@Ian: You might be correct. I have ran MACS using shiftsize/sonicated fragment size as seidel suggested. But there is no difference at all. Some thing is fishy about MACS single read peaks!!!
The other thing is that you will not get the correct number of reads by looking at the coverage of input reads as MACS constructs a background lambda model. Also MACS uses non-redundant reads that are extended according to "d". Hope that makes some sense! :)
Make sure to visualize the wig file that MACS produces for both the input and control in a browser like UCSC or IGV. MACS will still call peaks even if the IP or enrichment didn't work in the experiment, but visual inspection of the peaks in that case shows that they are likely just noise. A working TF experiment should have very clearly defined, high peaks compared to the input. Compare them to real peaks from a successful experiment if you're not sure - the wig files for all of the ENCODE experiments are available from UCSC and make a good standard.
As always you would need to more closely investigate the operation of the tool. Make sure that the bandwidth that the tool detects does indeed match the expected fragment size. It is quite possible that it will collect data further away from the site than the actual width that you know of. Make also sure to visualize the bed file (too see the full width) that you get not just the peak.
I may have misunderstood this answer, but as far as i am aware "band width" no longer has any effect on peak calling, only in the estimation of "d" during the model building step.
"--bw The band width which is used to scan the genome for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building." - Manual 1.4.2
@Ian: You might be correct. I have ran MACS using shiftsize/sonicated fragment size as seidel suggested. But there is no difference at all. Some thing is fishy about MACS single read peaks!!!
The other thing is that you will not get the correct number of reads by looking at the coverage of input reads as MACS constructs a background lambda model. Also MACS uses non-redundant reads that are extended according to "d". Hope that makes some sense! :)