In general, I would like to validate my ChIP-seq output from MACS2. My ChIP-seq dataset contains libraries that are not pure technical replicates -- the biological sample (1 tube) was divided in three samples (three tubes) for sequencing. The variation between samples is likely due to the sequencer. In any case, how may I validate/compare the replicates computationally.
Nice! however how current is that ENCODE pipeline (your first link)? I've used the main IDR repo recently but was never quite sure how running IDR this way compares. Also how important is it to go through the process of generating and calling peaks from pseudoreplicates (as per the ENCODE pipeline)? Does your pipeline automate this?
Im not sure you really have to worry too much about the current-ness of the encode pipeline as it's still extensively used and the component software (eg., IDR, SPP, MACS2) is still being actively developed. I think of it as a psuedogold standard pipeline (in the absence of validation :) ) for TF chip calling.
I think subsampling just makes the analysis more rigorous. I mean if you see certain peaks in one psuedosample and not the other, or the peaks from baseline are drastically different, it's kinda questionable if it's real signal. But yes, my pipeline has automated the psuedoreplicate portion as well. I'm not sure if it will work out of the box for you, as you'll likely have a different cloud / HPC setup then me. But it should be compatible with a VM running ubuntu 14.04 lts which you can rent off AWS. You'll likely want to go line by line for a small set of test samples, see where things break, make w/e changes are needed, and then throw the kitchen sink at it.