Question

Map ENCODE DNaseI footprints to putative TFs

0

Entering edit mode

10.8 years ago

enricoferrero ▴ 920

Hi,

Has anybody used the DNaseI footprint data from ENCODE?

In the corresponding article, Neph et al., 2012 map the footprints to transcription factors by scanning the DNA sequences with motifs from TRANSFAC and JASPAR.

I'm looking for some type of data linking each footprint with its putative(s) transcription factor(s), without having to repeat the motif finding/scanning analysis that was done in the paper.

Any help is appreciated, many thanks!

Footprint genomics ENCODE TF dnase • 2.9k views

ADD COMMENT • link updated 3.5 years ago by Alex Reynolds 36k • written 10.8 years ago by enricoferrero ▴ 920

Ram · Answer 1 · 2014-07-10

2

Entering edit mode

10.8 years ago

brentp 24k

You could use this track with the raw data from here.

It is a bed file of locations of 100+ TF binding sites and each interval is labelled as to which TF it's from

ADD COMMENT • link updated 3.5 years ago by Ram 45k • written 10.8 years ago by brentp 24k

0

Entering edit mode

Thanks, but no, those are transcription factor binding locations obtained through ChIP-seq. I'm after associations between TF motifs and DNaseI footprinting data.

ADD REPLY • link 10.8 years ago by enricoferrero ▴ 920

score 1 · Answer 2 · 2014-07-10

If it helps, footprint data are available here.

The hypersensitivity track data contains regions of DNaseI hypersensitivity (DHSs) called through a process developed in the Stamatoyannopolis lab. The Nature Methods paper by Sabo et al. describes this in more detail.

The raw signal are windowed regions of chromatin accessibility across a genome, with the density of the 5' end of DNase cut fragments within each 150 bp window, every 20 bp.

Peaks and hotspots are called from these regions.

Within nearly all DHSs, there is often one or more footprints found. This footprint is a relatively shorter region of variable length (6-40 bp, I think) where DNase does not cleave, because of bound proteins or protein complexes (or related phenomena, like lack of methylation, which allows proteins to bind DNA), like transcription factors or transcription initiation machinery.

See the Nature ENCODE paper by Neph et al., also by the same lab, which explains this in more detail.

If you're doing TF prediction, the footprints will likely be of more use to you. Shane Neph's de novo motif discovery algorithm used these footprints to discover binding sites for 683 motifs. 394 of them matched entries in non-redundant and reduced known and experimental transcription factor databases (TRANSFAC. JASPAR Core, UniPROBE, and some data sets from the Kellis lab). The remaining 289 were found to be novel, frequented millions of footprints, and showed similarities with known TFs.