Question

Can I run ENCODE ChIP-Seq pipeline or get their code?

0

Entering edit mode

7.7 years ago

rd ▴ 20

I found some interesting ENCODE CTCF ChIP-Seq samples that I would like to compare to some other publicly available data. To avoid any technical artifacts caused by the difference in pipelines used, I would like to process all the publicly available samples using the ENCODE ChIP-Seq pipeline to be able to compare everything to the ENCODE CTCF ChIP-Seq samples that I am interested in. Is it possible to find the code/pipeline ENCODE uses? Have they made it publicly available? I spent a few hours on google but that hasn't resulted in anything yet.

I have found general pipeline notes on their github account like:

"1- Map reads with BWA, mark duplicates Picard, and remove duplicates. 2- Estimate library complexity and calculate calculate NRF (non-redundant fraction), PBC1, PBC2 (PCR bottleneck coefficient). 3- Calculate cross-correlation analysis with spp/phantompeakqualtools. 4- Generate p-value and fold-over-control signal tracks for each replicate and replicates pooled with MACS2. Call peaks with MACS2."

Every tool has a set of parameters that can alter the outcome a little bit. So I wanted to check whether the actual pipeline is available before I write a pipeline based on the general description I found in papers.

Thanks!

ChIP-Seq ENCODE • 5.1k views

ADD COMMENT • link updated 7.7 years ago by GouthamAtla 12k • written 7.7 years ago by rd ▴ 20

score 0 · Answer 1 · 2017-03-30

0

Entering edit mode

7.7 years ago

Santosh Anand 5.8k

http://encodedcc.stanford.edu/encodevm/

ADD COMMENT • link 7.7 years ago by Santosh Anand 5.8k

0

Entering edit mode

I downloaded the ENCODEVM and looked through most folders. There are Rscripts to make their plots but no ChIP-Seq pipeline. Most scripts seem to take as input already processed ChIP-Seq data. Did I miss something?

ADD REPLY • link 7.7 years ago by rd ▴ 20

0

Entering edit mode

Hmmm. Check if their IDR framework has got the thing that you are looking for https://sites.google.com/site/anshulkundaje/projects/idr

ADD REPLY • link 7.7 years ago by Santosh Anand 5.8k

score 0 · Answer 2 · 2017-03-30

0

Entering edit mode

7.7 years ago

karl.sebby ▴ 100

The RNA-seq pipelines have a list of steps and associated bash scripts that show how the tools are being used (https://github.com/ENCODE-DCC/long-rna-seq-pipeline/tree/master/dnanexus), but I don't see that for ChIP-seq pipeline. I have found it useful to look at experiments that use the pipeline, rather than just the pipeline itself, e.g. https://www.encodeproject.org/experiments/ENCSR670JDQ/.

ADD COMMENT • link 7.7 years ago by karl.sebby ▴ 100

0

Entering edit mode

The easiest solution would be to run it on DNAnexus platform. Or you could always try to get an account and see if you can see parameters there. I tried to get an account with my gmail address but was denied-- it wanted an institutional address so I'm not sure how hard it is to get onto their platform just to look around.

ADD REPLY • link 7.7 years ago by karl.sebby ▴ 100

0

Entering edit mode

I also noticed that some of the ENCODE pipelines have explicit code on github which is why I thought I might be missing something for ChIPSeq.

I made an account on DNAnexus and it seems to have workflows one can use on the platform but I can't seem to extract parameters to run on my machine. In some ways it resembles Galaxy. If that is my only option then I will analyze the data through the DNAnexus portal. Right now i am downloading The ENCODE virtual machine at the suggestion of Santosh Anand. I will report back if I find it useful.

ADD REPLY • link 7.7 years ago by rd ▴ 20

score 0 · Answer 3 · 2017-03-30

0

Entering edit mode

7.7 years ago

GouthamAtla 12k

You can definitely run. All the scripts are on github with a detailed protocol.

https://github.com/kundajelab/chipseq_pipeline

ADD COMMENT • link 7.7 years ago by GouthamAtla 12k

0

Entering edit mode

Thanks! I will try to go through it.

ADD REPLY • link 7.7 years ago by rd ▴ 20