I've been discussing ChIP-seq normalization methods with colleagues, and they introduced me to a normalization approach that involves calculating scaling factors by dividing a ratio of input spike-in counts by a ratio of IP spike-in counts. This method is new to me, and I'm trying to gain a better understanding of its validity and potential use in ChIP-seq analyses.
Specifically, we take the following steps for each sample's input and immunoprecipitate (IP), both of which have been spiked with chromatin from a different species:
First, tally the numbers of genome-wide quality-checked (QC'd) alignments for input, immunoprecipitate (IP), spike-in input, and spike-in IP
## R pseudocode ##
main_input <- # Integer value of QC'd genome-wide alignments for "main" input
main_IP <- # Integer value of QC'd genome-wide alignments for "main" IP
spike_input <- # Integer value of QC'd genome-wide alignments for spike-in input
spike_IP <- # Integer value of QC'd genome-wide alignments for spike-in IP
These alignment counts represent the raw data for each sample.
Second, calculate the ratio of spike-in input to "main" input, and calculate the ratio of spike-in IP to "main" IP
## R pseudocode ##
ratio_input <- spike_input / main_input
ratio_IP <- spike_IP / main_IP
The assumption here is that the spike-in controls are proportional to the actual chromatin content and should ideally represent similar scaling factors.
Third, for the sample IP, calculate a scaling factor by dividing the input ratio by the IP ratio
## R pseudocode ##
sf_IP <- ratio_input / ratio_IP
This scaling factor aims to correct for potential differences in IP efficiency and sequencing depth between the input and IP samples.
Fourth, for the sample input, set the scaling factor to 1
## R pseudocode ##
sf_input <- 1 # (i.e., from ratio_IP / ratio_IP)
Since the scaling factor for the input is calculated as 1, the input coverage will remain unchanged after scaling.
Finally, using deepTools bamCoverage
, compute IP coverage by multiplying by the IP scaling factor
## shell pseudocode ##
main_IP_bam= # QC'd IP alignments to "main" genome (spike-in IP alignments have been excluded)
bin_size= # For example, 1
sf_IP= # Value calculated in step #3
main_IP_bw= # Bigwig file of scaled coverage
bamCoverage \
-b "${main_IP_bam}" \
--binSize "${bin_size}" \
--scaleFactor ${sf_IP} \
-o "${main_IP_bw}"
This should theoretically normalize the IP coverage based on the spike-in controls.
I've done some searching, but so far I couldn't find any publications or resources that describe this particular normalization method. I'm curious to know if any of you have come across or used this approach in your ChIP-seq analyses. Are there any potential benefits or limitations associated with this method?
I will appreciate any insights and feedback on this topic, which will help me make informed decisions about normalization approaches for my ChIP-seq data. Thank you in advance for your time and assistance.
Thank you for this. In your example, you divide the fly counts by all counts (i.e., human plus fly counts) rather than human counts alone, which was what my colleagues described. Is there a reason you use the former denominator over the latter?
Because the spike-in alignment counts are low to begin with, the resulting scaling factors are similar if using (a) only human counts or (b) all counts for the denominator.
(a) Using human counts in the denominator
Determine % of fly reads for each ChIP and input:
Determine scaling factor for each ChIP (input fly % / ChIP fly % = scaling factor):
Normalize scaling factors by dividing all scaling factors by highest scaling factor (this is used for
bamCoverage
—if the scaling factor is in the denominator instead, scale by the lowest scale factor):(b) Using all counts in the denominator
Determine % of fly reads for each ChIP and input:
Determine scaling factor for each ChIP (input fly % / ChIP fly % = scaling factor):
Normalize scaling factors by dividing all scaling factors by highest scaling factor (this is used for
bamCoverage
—if the scaling factor is in the denominator instead, scale by the lowest scale factor):Well, the human and fly reads are still from one sample, so it makes sense to take them both into account to address total read depth differences (as one sample could have more many more fly reads than another if the global shift is large). We can get pretty dramatic shifts where fly counts will be >5x higher in a sample with widespread depletion than a wildtype because the IP doesn't work as well due to much less protein. That represents a biological shift and is information that shouldn't be tossed out.
In this example, it doesn't make much of a difference in the final number, but in some experiments, I expect it'd be a more significant shift. In your example, you can already see a larger impact on samples with a greater proportion of fly reads - change of 0.018 for ChIP_2, 0.01 for ChIP_3.
I can't think of any compelling reasons to not include the fly reads - it may be worth asking your colleagues why they do not.
Thank you, makes sense.
Good point, I will do so.
Thanks again, jared.andrews07
In your previous work, did you happen to work with any ChIP-seq datasets where this shift is demonstrable?
One of the goals of my current work is—in addition to standard analyses—to provide my colleagues with a kind of tutorial for basic ChIP-seq analyses. I'd like to present and demonstrate the point that, with this approach to normalization, it's preferable to divide by the total counts (i.e, the "main" plus spike-in counts) rather than just the "main" counts to address read-depth differences. The shift is not apparent with the datasets I'm currently working with—i.e., dividing by either total or "main" yields similar scaling coefficients. I want to clearly show this because some colleagues have been dividing by only "main" counts and getting (apparently) reasonable results; I want to make the point that more extreme circumstances will yield inappropriate scaling coefficients.
So, if you happen to have encountered or know of any publicly available datasets useful for this purpose, would you mind to point me to their papers or accession numbers?
The study I linked to in my answer has some ChIP-seq data that will have some skew, as it contains data for a mutation that ablates H3K27me3 across the genome (with focal retention at certain loci, so not completely gone). So the samples with the mutation tend to have more fly reads than those without, as the IP just doesn't pull down as much.
If a rather extreme toy example is good enough, let's use the calculations in your comment above. I kicked the ChIP_2 fly reads up to 9 million, and using all reads results in a normalized scaling factor of 0.144 while using only the human reads results in 0.116. Not a huge absolute difference, but noticeable compared to the difference between using all reads and only human reads seen for ChIP_3 (0.028 for ChIP_2 versus 0.01 for ChIP_3).
Sounds good, thank you so much!
Perhaps this question is better suited for a new post, but since it follows on this work, I will post it here for now and move it if requested.
If I wanted to use
DESeq2
for a differential binding analysis by generating a matrix of ChIP counts per n-bp bins, for example:Is it reasonable for me to supply the scaling factors calculated by the above-described method rather than use the values generated by
DESeq2::estimateSizeFactors()
(using either the human "main" counts or the fly spike-in counts)?I'd probably use DiffBind and use the spike-ins in the normalization function (it uses DESeq2 on the backend). It limits analysis to peak regions though, so depending on what you're trying to do, may have to adjust.
Hi jared.andrews07 — thanks again for your advice and guidance.
A couple quick follow-up questions regarding these points:
#1 — If we wanted to plot input coverage alongside ChIP coverage, would you also scale input down in this manner?
For example...
#2 — And if we wanted to examine the log2 ratios of ChIP to input with, e.g., bigwigCompare, we'd use the scaled-down ChIP and input bigwigs? (However, to identify these kinds of positional changes in signal with confidence, I think a statistical approach (e.g., via diffBind, DESeq2, csaw, etc.) is better suited than visualizing ratios of ChIP coverage to input coverage—but I'm asking because this is what my bench biologist colleagues are used to.)
I typically don't bother normalizing the input tracks, so kind of your call. I suppose I would scale them similarly.
I do the comparisons with the tools you mentioned to find actual differential loci, then use the normalized tracks if I need to actually show the profile/tracks in a figure. I don't do direct comparison of the tracks - they are just a nice visual aid to go along with the statistical methods.
Hello,
The calculation method for the spike-in scale factor you provided has inspired me. However, I am confused about it. In the study you linked, the scale factor calculation was described as follows: "The spike-in normalization was performed by counting Drosophila melanogaster reads and mouse reads in each IP sample and corresponding Input sample, and using those counts to generate the spike-in normalization factor for each sample, which was calculated as (IP_dm3.reads/IP_mm9.reads)/(INPUT_dm3.reads/INPUT_mm9.reads)." This differs from your answer, which states "input fly%/ChIP fly% = scaling factor." How should I choose the appropriate normalization calculation method?
Hi,
I agree that this is confusing. Also, after reading the Methods section of the publication mentioned above, I don't know how the spike-in-derived scaling factor was applied except that—if I understand correctly—it was somehow done via HOMER. Perhaps jared.andrews07 can weigh in on this.
Were I to pursue a spike-in normalization method like this, then I would go with the "input fly%/ChIP fly% = scaling factor" approach. (However, assuming I had the necessary values from the ChIP-seq benchwork to run the calculations, I would avoid this approach in favor of the relatively new and more rigorous siQ-ChIP method.)