I am reading a paper, in which ChIP-Seq was used to identify the binding sites of a target TF in mice. Before de novo motif discovery, the authors filter out peaks that were located in false-positive regions. (The authors claim that these false positive peaks emerged from low complexity sequence).
Since in this study, the ChIP-Seq on the target TF knock out mice was included, I guess the authors do the peak calling on the knock out strain with the input control, And all peaks from KO mice should be false positive. And it seems these regions are quite consistent in other ChIP seq experiments.
Thus I am wondering in general Chip-Seq experiment without the target TF knock-out, how should be removed the false positive peaks? and how do these false positive peaks arise, is it because the low complexity region has multi copies in the genome? but if that is the case, the false positive peaks could be largely elimated by using control non-ChIP DNA.
Thank you very much for sharing your ideas in advance,
Hi, Chris,sorry for my late response. Thanks for pointing out the the paper, it is very helpful to understand the issue. However I am looking at the chip-seq data from mice, do you aware similar resources for mouse genome?
Sorry, I haven't seen anything similar for mice. Perhaps you could try a similar approach to what the authors did in that paper to publicly available mouse data? You could also screen for the presence of satellite or simple repeats since those are probably most likely to be present in those mis-assembled regions.
Anyway, as you say, using an input non-ChIP control gets rid of most of these types of false positives.