Question

Analyzing ChIP-seq without input

2

Entering edit mode

5.9 years ago

Assa Yeroslaviz ★ 1.9k

I was wondering if anyone can tell me what is the advantages and disadvantages of doing a ChIP-Seq experiment with input samples (as oppose to no control samples).

We have had a discussion today whether or not it make sense to use input samples in the normalisation process and what it would mean to have none.

I would like to heat the opinion of others. If anyone know about a paper to this topic, I would really appreciate it.

Do I need to have a much higher coverage (read count) for my IPed -samples, if no control samples are available?

How many reads do you aim at, when doing a ChIP-Seq e.g. with mouse?

thanks Assa

ChIP-Seq normalisation • 6.0k views

ADD COMMENT • link 5.9 years ago by Assa Yeroslaviz ★ 1.9k

0

Entering edit mode

Thanks a lot for both answers(@ATpoint, @jared.andrews07). What would be than the advantage of having a data set with no controls? Why does MACS need the option to analyze data without the input samples? Can these samples cause a bias of my results?

I am aware of the problem with stating a number of reads, especially with ChIP-Seq, as it depends also on the width of the peaks, but would having an experiment with e.g. IP and input samples with each 25Mil reads (all in triplicates...) would be better than having a data set of only IP samples with 50Mil reads each?

ADD REPLY • link 5.9 years ago by Assa Yeroslaviz ★ 1.9k

1

Entering edit mode

What would be than the advantage of having a data set with no controls

There is none.

I am aware of the problem with stating a number of reads, especially with ChIP-Seq, as it depends also on the width of the peaks, but would having an experiment with e.g. IP and input samples with each 25Mil reads (all in triplicates...) would be better than having a data set of only IP samples with 50Mil reads each?

I am not sure you really need replicated controls, we typically do one IgG control and that is it. Better have more replicates for the IP with fewer reads than fewer replicates with more reads. Replicates is what gives you power in differential analysis, depth not so much.

ADD REPLY • link 5.9 years ago by ATpoint 88k

score 5 · Answer 1 · 2019-07-03

A lot of these questions are really going to vary depending on your experimental setup. Antibodies used, target protein (TF or histone), and binding density of the target are all going to play a big part in how your ChIP-seq goes.

Input samples are used so that regions with high levels of background binding (artifacts) are ignored during the analysis, decreasing the number of false positives during peak calling, differential binding, etc. You can, of course, do an experiment without them, but they are typically worth doing. Regardless, you should remove/ignore peaks that fall within the ENCODE blacklisted regions, as they nearly always have very high (artificial) signal. Most of them are near centromeres and extremely repetitive regions. Input samples also help remove background signal that might differ between your different samples if they are from different tissues or subjected to different conditions.

Assuming your ChIP actually worked well, you shouldn't need higher coverage - peaks should be clear regardless. That said, if your antibody isn't great or you are trying to ChIP a particularly difficult factor, input samples help a lot in differentiating peaks of low magnitude from background.

The reads question is really variable, but most experiments are usually between 10-20 million. Maybe even down to 5 if the antibody is robust and binding isn't widespread.

score 2 · Answer 2 · 2019-07-03

I was wondering if anyone can tell me what is the advantages and disadvantages of doing a ChIP-Seq experiment with input samples (as oppose to no control samples).

IgG controls help you distinguish true and specific binding of your antibody from unspecific binding. That is the advantage. Only disadvantage is basically that it requires a bit more effort and money.

We have had a discussion today whether or not it make sense to use input samples in the normalisation process and what it would mean to have none.

I find inputs useful during peak calling but that's pretty much it. During normalization for differential analysis they are not required. Peak callers such as MACS will per-million scale treatment to control.

I would like to heat the opinion of others. If anyone know about a paper to this topic, I would really appreciate it.

ENCODE guidelines and the manuals and papers of differential analysis software such as csaw.

How many reads do you aim at, when doing a ChIP-Seq e.g. with mouse?

Again, ENCODE standards are a good start. Depends strongly on what you ChIP and the antibody quality. For a decent AB against a TF or histone modofication, something in the range of 25mio reads is what people typically go for. As sequencing on e.g. Novaseq becomes more and more cost effective this should not be too much of an issue nowadays. Replication is much more important than depth to have decent statistical power and be confident that actual changes are not due to poor IP efficiency.

score 2 · Answer 3 · 2019-07-03

Interesting article to read:

ChIP-Seq: Technical Considerations for Obtaining High Quality Data

"An important part of ChIP-Seq experimental design is determining which controls to use. Artifacts may arise in the following steps of experimentation. First, in terms of chromatin fragmentation, open chromatin regions are easier to shear than are closed chromatin regions and thus the former may be associated with higher background signals9. Second, antibodies may have unrecognized cross-reactivity. Third, there may be variable sequencing efficiency of DNA regions with different base compositions. Although both nonspecific immunoglobulin G (IgG) antibodies and input chromatin have been used as controls, IgG may be less desirable in certain circumstances because of the following reasons: most IgG antibodies are not obtained from true preimmune serum from the same animal in which the specific antibody was raised; and IgG antibodies usually immunoprecipitate much less DNA than specific antibodies do, and thus limited genomic regions from the control may be overamplified during the library construction step. In this case, the resulting sequence 'reads' will not cover the genome as sufficiently as a background model would for peak identification. Therefore, input chromatin serves as a better control for bias in chromatin fragmentation and variations in sequencing efficiency; additionally, it provides greater and more evenly distributed coverage of the genome. "