Question

False Positive Peaks In Chip-Seq - How To Locate Them And How Do They Arise?

2

Entering edit mode

11.6 years ago

Tky ★ 1.0k

I am reading a paper, in which ChIP-Seq was used to identify the binding sites of a target TF in mice. Before de novo motif discovery, the authors filter out peaks that were located in false-positive regions. (The authors claim that these false positive peaks emerged from low complexity sequence).

Since in this study, the ChIP-Seq on the target TF knock out mice was included, I guess the authors do the peak calling on the knock out strain with the input control, And all peaks from KO mice should be false positive. And it seems these regions are quite consistent in other ChIP seq experiments.

Thus I am wondering in general Chip-Seq experiment without the target TF knock-out, how should be removed the false positive peaks? and how do these false positive peaks arise, is it because the low complexity region has multi copies in the genome? but if that is the case, the false positive peaks could be largely elimated by using control non-ChIP DNA.

Thank you very much for sharing your ideas in advance,

chip-seq • 5.2k views

ADD COMMENT • link updated 11.6 years ago by Manu Prestat 4.1k • written 11.6 years ago by Tky ★ 1.0k

score 2 · Answer 1 · 2013-06-04

2

Entering edit mode

11.6 years ago

Chris Whelan ▴ 590

You might want to check out this paper by Pickrell et al, "False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copy number regions":

http://www.ncbi.nlm.nih.gov/pubmed/21690102

They identified regions that are consistently called as peaks due to consistently getting high coverage, which they think is due to misassembled regions in the reference. They provide a set of bed files you can use to filter out peaks in those regions.

In my experience using a non-chip control usually gets rid of those peaks, but not always, so it's a good idea to check your peaks against those regions.

ADD COMMENT • link 11.6 years ago by Chris Whelan ▴ 590

0

Entering edit mode

Hi, Chris，sorry for my late response. Thanks for pointing out the the paper, it is very helpful to understand the issue. However I am looking at the chip-seq data from mice, do you aware similar resources for mouse genome?

ADD REPLY • link 11.6 years ago by Tky ★ 1.0k

0

Entering edit mode

Sorry, I haven't seen anything similar for mice. Perhaps you could try a similar approach to what the authors did in that paper to publicly available mouse data? You could also screen for the presence of satellite or simple repeats since those are probably most likely to be present in those mis-assembled regions.

Anyway, as you say, using an input non-ChIP control gets rid of most of these types of false positives.

ADD REPLY • link 11.6 years ago by Chris Whelan ▴ 590