Hello.
I would like to download the ENCODE DNase I footprint data for the 41 human cell and tissue types described by Neph et al. (2012) Nature, 489:83-90.
Here is an example to illustrate my question. Consider only a single human cell type, K562. The following DNase I data files can be found via NCBI's GEO repository, using accession number GSE26328 (see http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM646567)
SRX/SRX037/SRX037116
GSM646567_hg19_wgEncodeUwDgfK562Aln.bam
GSM646567_hg19_wgEncodeUwDgfK562Hotspots.broadPeak.txt.gz
GSM646567_hg19_wgEncodeUwDgfK562Pk.narrowPeak.txt.gz
GSM646567_hg19_wgEncodeUwDgfK562Raw.bigWig
GSM646567_hg19_wgEncodeUwDgfK562Sig.bigWig
Upon downloading these files, it becomes apparent that none of them contain the actual footprints. Perhaps the most relevant file is GSM646567_hg19_wgEncodeUwDgfK562Hotspots.broadPeak.txt.gz, which contains 256,735 lines, each pertaining to a DNase I Hypersensitive Zone (see http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeUwDgf). Indeed, this number matches up exactly with the number of "Hotspot Regions" reported in Table S1 of the Neph et al. (2012) paper.
What I am looking for, however, is a .bed file that lists the coordinates of the footprints that these Hotspot Regions contain. Based on Table S1, this file should have 498,683 lines (see "Number of Footprints" column).
Does anyone know where to find these files? Moreover, does anyone know where to find the Footprint Occupancy Scores for these 41 cell and tissue types?
Direct from the source! Thank you sjneph, much appreciated.