Question

ENCODE DNase Hypersensitivity narrowPeak scores

0

Entering edit mode

9.5 years ago

agd27 ▴ 130

Hi All,

I have a set of genomic locations I would like to intersect with the relevant DNase Hypersensitive peaks from ENCODE. My question is simple, but the answer has been oddly hard to come by: what do the scores in column 7 mean? There is no mention of how the scores are calculated in the documentation and a search for the peak caller mentioned documentation (I-Max) came up empty. Here's a sample of the lines from one of the files (full file at https://www.encodeproject.org/files/ENCFF001YNU/@@download/ENCFF001YNU.bed.gz).

chr1    3002740 3002890 .       0       .       25      6.60995 -1      -1
chr1    3058240 3058390 .       0       .       15      3.99799 -1      -1
chr1    3085640 3085790 .       0       .       68      60.6165 -1      -1

My first inclination was that the scores indicated the number of cleavage sites in the 150 base windows but they range up into the 1000's, so that is clearly incorrect. Average tag density maybe? Hoping someone can steer me in the right direction. Thanks!

ChIP-Seq DNase-Hypersensitivity ENCODE • 3.6k views

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 9.5 years ago by agd27 ▴ 130

1

Entering edit mode

9.5 years ago

Ying W ★ 4.3k

I'm pretty sure they are using hotspot not i-max. The word document on that page talks a bit about its scoring metric.

ADD COMMENT • link 9.5 years ago by Ying W ★ 4.3k

0

Entering edit mode

Thanks for the thoughts! These are for the peaks, though, not the hotspots. I did check the word document and that is where I found the allusion to I-Max. (That document, by the way, appears to have been copy-pasted straight from the UCSC browser track description.) That said, I think I answered my own question after looking over the tracks corresponding to the files I'm using -- adding my answer below.

ADD REPLY • link 9.5 years ago by agd27 ▴ 130

0

Entering edit mode

I'm also interested in some DNase Hypersensitive peaks from ENCODE which refer to the "I-Max" peak finding algorithm. Did you figure out if this is some sort of obscure allusion or perhaps a typo? (I'm interested because I'm trying to replicate their peak calling process with another dataset). Thanks in advance!

ADD REPLY • link 8.9 years ago by linnaean • 0

score 1 · Accepted Answer · 2015-11-09

Well, after much head-scratching, I think I found an answer to my own question. For those interested, after comparing the file contents to the relevant browser tracks, it looks like the scores given correspond to the maximum of the raw signal track overlapping the peak. So it does correspond to observed 5' end reads within the 150bp window, as I'd suspected. Hopefully this saves someone else the headache of having to figure this out some day!