Hi everyone,
I have a bit of an odd question which admittedly has come more from my PI than me.
We have done some ChIP-seq analysis for a TF on some tissue and cell samples. I have analysed all of these samples with the exact same pipeline and for each sample, I have run MACS2 to identify peaks against an input control for each sample. What I want know is, is it fair to say if I have more peaks called in one sample versus another cell sample that the sample with higher peak number has more TF binding to DNA than the other sample? For example, I have sample A with 3900 peaks, sample B with 11000 peaks and sample C with 19000 peaks. In order to say that my sample C has more TF binding to the DNA than the other samples, is there any sort of normalisation I should do? Are these numbers comparable to each other as is? I've read a lot about normalising peak heights between samples (using library size etc) but this is a different question. Can numbers of peaks themselves be compared between samples??
Adding to your (already excellent) reply:
Was the IP efficiency the same in each cell type? Relatedly, was the cell number and DNA extraction efficiency similar? Is the genomic sequence identical (or close enough) in each cell type (cell lines are typically a bit different)? Do the inputs show essentially identical read distributions between samples? Is the noise level surrounding peaks similar between samples (this is related to IP efficiency, but since you mentioned MACs, it's sensitive to this)?