Question

H3K27ac peaks identification from UCSC hg19 annotation database

0

Entering edit mode

5.7 years ago

Shicheng Guo ★ 9.6k

Hi All,

I want to obtain H3K27ac annotation for HCT116 cell. therefore I download these related annotation files from UCSC hg19 annotation folder. My question is what I need to do is just remove input peaks gEncodeSydhHistoneHct116InputUcdSig.txt.gz from wgEncodeSydhHistoneHct116H3k27acUcdPk.txt.gz, correct?

  wgEncodeSydhHistoneHct116H3k27acUcdPk.sql             17-Jun-2012 10:26  1.7K  
  wgEncodeSydhHistoneHct116H3k27acUcdPk.txt.gz          17-Jun-2012 10:26  488K  
  wgEncodeSydhHistoneHct116H3k27acUcdSig.sql            17-Jun-2012 10:26  1.3K  
  wgEncodeSydhHistoneHct116H3k27acUcdSig.txt.gz         17-Jun-2012 10:26  124   
  wgEncodeSydhHistoneHct116InputUcdSig.sql              17-Jun-2012 10:26  1.3K  
  wgEncodeSydhHistoneHct116InputUcdSig.txt.gz

http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/

Thanks.

H3K27ac peak input • 986 views

ADD COMMENT • link 5.7 years ago by Shicheng Guo ★ 9.6k

0

Entering edit mode

No people give any suggestion. However, I do received some responses from ENCODE Project. Thank you for your interests in ENCODE data and contacting ENCODE.

First, I'd like to mention we highly recommend people to use the official ENCODE portal at https://www.encodeproject.org/ instead of other sources. The specific experiments/annotations you are interested in are listed here: https://www.encodeproject.org/search/?biosample_ontology.term_name=HCT116&type=Experiment&assay_title=ChIP-seq&lab.title=Peggy+Farnham%2C+USC&status=released&target.investigated_as=narrow+histone+mark&target.investigated_as=control . When you check out specific experiments, you might notice that some old lab processed data has been deprecated (archived). ENCODE has processed the raw data with the uniform ENCODE Processing Pipeline which makes ENCODE experiments from different labs consistent and comparable.

For the specific data you mentioned, I can't say for sure but you can compare the md5sum to see if your files are the same as those archived data. One problem with those old data is it might be difficult to figure out how they are generated nowadays. I cced J Seth Strattan who is our long time pipeline expert. He might know some ancient stories about it. Based on my currently knowledge on the ChIP-seq pipeline (also explained below), it is possible the input peaks are already "removed" (not necessarily literally but statistically) from the peak data.

If you can by any chance use our new uniformly processed data, it will be beneficial and we appreciate it. You can find info about the pipeline here: https://www.encodeproject.org/pages/pipelines/#DNA-binding . For histone ChIP-seq, we use MACS2 to call peaks and the control experimental data is used as an input control during the peak calling process. Therefore, for the final H3K27ac peaks, the input peaks have been statistically "removed". Again, Seth can provide more details if he feel necessary and appropriate.

Thank you again for your questions. If you really have to use those old data and/or come into more questions, please feel free to contact us and we will see what we can do to help.

ADD REPLY • link 5.7 years ago by Shicheng Guo ★ 9.6k