I have chip-seq data on histone modifications. I've been scouring literature and blogs on Chip-seq analysis involving normalizing to input and normalizing across samples using spiked-in samples.
There doesn't seem to be a cohesive differential binding analysis approach that can incorporate input normalization along with spike-in normalization.
It seems most of the diff. binding approaches involves using RNA-seq methods (EdgeR, DESeq2) on read counts over genomic windows. I can substitute normalization factors used in these RNA-seq packages with spike-in normalization factors, but how do I account for input? Is blacklisting sites that are not different from input really the best way? Transforming the counts over input via log2fc or subtraction is not statistically sound (other bioinformaticians seems to agree).
I've looked at the input signal for my data and have found signal patterns in areas consistent with some of my histone markers. This makes me think that I should really normalize my IP to input before performing differential binding analysis.
Presence of binding bias in input samples also seems to be supported by this paper (http://www.pnas.org/content/106/35/14926.long) where they found crosslinked, sonicated chip-seq samples (no IP) having signals that correspond to open chromatin.
Maybe input normalization isn't even necessary if we make the assumption that input is consistent across my different histone modification IPs? However, wouldn't that decrease the statistical power of the differential binding analysis?
This is my first time analyzing chip-seq data. Any thoughts on this from experts would be appreciated.
Without being an expert, I have been told to not use input for normalization accross samples and that its usage is best limited to peak calling within conditions and visualisation (to ensure that peaks in the IP are not present in the input).