Question

How quantitative is Chip-seq?

7

Entering edit mode

8.6 years ago

YanO ▴ 140

How quantitative is Chip-seq? If it isn't then why do we need normalisation to per million mapped reads? Can we compare the signal across samples if normalised to million mapped reads?

I am currently analysing ChIP-seq datasets, and I've been confused by the different attitudes I have come across to determining exactly how quantitative chip-seq is. As far as I am aware, it is qualitative, not quantitative. Chip-seq allows you to tell where a peak is present in one tissue but absent in another, but not how much more of your protein is present in one tissue over the other. You need Chip-RX for the quantitative stuff.

If this is the case, then why in the Roadmap Epigenomics paper 'Integrative analysis of 111 reference human epigenomes' do they say:

"To avoid artificial differences in signal strength due to differences in sequencing depth, all consolidated histone mark data sets ... were uniformly subsampled to a maximum depth of 30 million reads (the median read depth over all consolidated samples). "

I understand why you might do this for replicates, but this is not what they are talking about here. I have seen only a few other papers normalise reads like this. I always thought it wasn't necessary because you shouldn't have been comparing them quantitatively anyway.

Why do so many programs have the option to normalise to signal per million mapped reads if we cant compare quantitatively across samples anyway?

Let's say I have ChIP-seq'd H3K27me3 in stem cells and differentiated cells - I assume I am not simply allowed to get the signal per million mapped reads, and '"subtract" one from the other to see how much more/less is present in the differentited cells?

Any thoughts would be appreciated, thanks!

ChIP-Seq normalization • 5.4k views

ADD COMMENT • link updated 8.5 years ago by Fidel ★ 2.0k • written 8.6 years ago by YanO ▴ 140

score 7 · Answer 1 · 2016-05-06

7

Entering edit mode

8.6 years ago

Mads Lerdrup ▴ 460

Personally, I'd interpret chipseq as semiquantitative out of caution. There might be a linear range, but it is hard to know when your signal scales linearly, and when the antibody, washing or amplification add a nonlinear contribution. Also differences in chromatin access will vary from cell type to cell type. I think that it will be quite challenging to make the appropriate controls demonstrating liniarity.

ADD COMMENT • link 8.6 years ago by Mads Lerdrup ▴ 460

3

Entering edit mode

A few other things to be aware of came to my mind:

As for immunofluorescence and many other antibody dependent methods, the level of the observed probe is not the same as the level of the target. All the methods require the antigen to be exposed equally well, and that can potentially change between compared samples - although a change in target density is a more likely explanation.
If possible, don't use the same sample for identifying a region (e.g. by peak-finding) and subsequent quantitation. That will add a stochastic contribution to one of the samples only. So if you have two replicates and use one of them for peakfinding and calculate the fold difference between the two signals at the peaks, then the sample used for peak-finding will have a stronger signal at the regions and you will get a skewed fold difference. So in ideal situations, plan experiments to include at least one biological replicate for identifying peaks and another for quantitation.
Most ChIP-seq that I see is not well-suited for detecting global changes (the recent development in spike-in strategies are improving this). ChIP-seq is best at detecting local differences in signal - e.g. comparing two subgroups of enhancers to each other. You might have a perfect and strong IP, but if the target is uniformly distributed throughout the whole genome, then it will not be possible to discriminate it from background. That implies another limitation to linearity, so that if a target binds a largely increased fraction of the genome, then the signal strength (and local enrichment) will seem reduced at the bound loci.

ADD REPLY • link 8.5 years ago by Mads Lerdrup ▴ 460

score 5 · Answer 2 · 2016-05-06

I guess I'll largely echo what Mads Lerdrup wrote. ChIPseq is semi-quantitative. You can often make comparisons between groups of samples (e.g., with diffbind), but you must do this with a fair bit of care (e.g., ensure that the ChIP efficiency was similar enough, that the sequencing quality and mappability are similar enough, ...). In these cases, things like mappability are held constant between groups, so as long as your ChIP/sequencing QCs are similar then the results are comparable. Of course, like any other single experiment, the results should fit into a larger story supporting some hypothesis.

Obviously if you starting altering your ChIP protocol significantly between groups then the comparison is meaningless. For this reason, it's often more meaningful to look at changes due to a treatment than between random cell types.

Devon Ryan · Answer 3 · 2016-05-13

Why do so many programs have the option to normalise to signal per million mapped reads if we cant compare quantitatively across samples anyway?

Actually you can. Many publications show images of normalized peaks to show differences. Furthermore, normalized genome coverage is useful for downstream analysis.

Let's say I have ChIP-seq'd H3K27me3 in stem cells and differentiated cells - I assume I am not simply allowed to get the signal per million mapped reads, and '"subtract" one from the other to see how much more/less is present in the differentiated cells?

This fine to do as well and is also used in publications.

You just need to be sure that the comparison does not introduces artifacts caused by (summarizing what Mads and Devon already said) :

low sequencing depth in one sample (imagine a sample with low sequencing where most regions have zero coverage)
different mapping strategies (one sample includes repetitive regions and the other doesn't)
differences in GC bias caused by PCR
high duplication rate in one sample
technical differences that can affect the ChIP efficiency
differences in open chromatin
differences in the underlying genomes of the two samples caused by rearrangements

Since read coverage is affected by a number of factors independent of ChIP, a common practice is to compute ratios of ChIP vs 'input' to obtain unbiased signals.