In short: Is there are a ChIP-Seq peak caller that accounts for replicates? If no, what is your recommended way to use replicates.
A bit more verbose: It seems to me that all the peak callers available call peaks in a single pull-down experiments (See here for some popular programs). They do this using a wide variety of methods and sophistication.
However, if you have (and you should have...) replicates of the same experiments, it remains unclear how to make the best use of the variability between replicates. Unless I've missed it, there is no peak caller designed for that. Very often I see a peak called in one replicate which is missed in another replicate, even if a "bump" is definitely there.
In my opinion/experience, the options available to combine replicates are:
- Irreproducible discovery rate. I've hear some skepticism about it. And does it work for more than two replicates?
- Call peaks on individual replicates and use some sort of heuristics to define the final set (e.g. peaks in n out of m replicates and/or combine p-values from different replicates). In the final set, one would like to have for each peak the position, an estimate of significance, enrichment, etc. How to get these information is not obvious.
- Just combine the individual input files and call peaks on that. This the easiest option but you obviously throw away the information of the sample to sample variability.
Any thoughts/ideas?
Thanks!
IDR pipeline from Anshul will work on 2+ replicates, you would just have to do like a round-robin type of comparison (A vs B, A vs C, B vs C) at one of the steps