I found some interesting ENCODE CTCF ChIP-Seq samples that I would like to compare to some other publicly available data. To avoid any technical artifacts caused by the difference in pipelines used, I would like to process all the publicly available samples using the ENCODE ChIP-Seq pipeline to be able to compare everything to the ENCODE CTCF ChIP-Seq samples that I am interested in. Is it possible to find the code/pipeline ENCODE uses? Have they made it publicly available? I spent a few hours on google but that hasn't resulted in anything yet.
I have found general pipeline notes on their github account like:
"1- Map reads with BWA, mark duplicates Picard, and remove duplicates. 2- Estimate library complexity and calculate calculate NRF (non-redundant fraction), PBC1, PBC2 (PCR bottleneck coefficient). 3- Calculate cross-correlation analysis with spp/phantompeakqualtools. 4- Generate p-value and fold-over-control signal tracks for each replicate and replicates pooled with MACS2. Call peaks with MACS2."
Every tool has a set of parameters that can alter the outcome a little bit. So I wanted to check whether the actual pipeline is available before I write a pipeline based on the general description I found in papers.
Thanks!
I downloaded the ENCODEVM and looked through most folders. There are Rscripts to make their plots but no ChIP-Seq pipeline. Most scripts seem to take as input already processed ChIP-Seq data. Did I miss something?
Hmmm. Check if their IDR framework has got the thing that you are looking for https://sites.google.com/site/anshulkundaje/projects/idr