In a previous post, I asked about the computation and application of spike-in derived scaling factors in ChIP-seq data processing. (See below for a summary of the method described in the post’s answer.)
If I understand correctly, these scaling factors should be applied to the ratio of IP to input coverage (e.g., with deepTools bamCompare
, which can compute the ratio of two BAM files). This ratio represents the relative enrichment of protein-DNA interactions captured in the immunoprecipitation reaction, adjusted for background signal by the input. In contrast, applying the scaling factor directly to IP coverage alone (e.g., using deepTools bamCoverage
) fails to account for variability in background signal. This approach—applying the scaling factor to IP-only coverage—has been described in these posts.
Is this understanding correct? How do other ChIP-seq researchers typically apply spike-in scaling factors in their analyses?
Method described in the previous post's answer:
To compute a spike-in-derived scaling factor for a given sample, take the following steps:
- Calculate the percentage of exogenous spike-in read alignments in the immunoprecipitate (IP) BAM file.
- Calculate the percentage of exogenous spike-in read alignments in the corresponding input BAM file.
- Compute the scaling factor by dividing the input spike-in percentage by the IP spike-in percentage, i.e.,
scaling factor = (input spike-in pct) / (IP spike-in pct)
- If comparing a group of samples, adjust the scaling factors by dividing each by the largest scaling factor in the group. This adjustment sets the maximum scaling factor to 1, ensuring consistent scaling across samples while preventing artificial signal inflation.