Hi everyone!!
I was trying to replicate a work present here wherein they have mentioned
We extracted DNA sequences for regions of 1001 bp centered at the enhancer midpoints (columns 7–8 of the BED12 file) using the BEDTools
However, in UCSC BED12 format description, the 7th and 8th columns represent the thick start and thick end. Shouldn't the average of the end and start coordinate considered as the center of the enhancers. Please help me understand this.
For eg
chr1 956563 956812 chr1:956563-956812 0.01 . 956664 956665 0,0,0 2 57,102 0,147
for this enhancer line from the BED12 enhancer file from FANTOM5, how can 956664 be center to enhancer when the center of enhancer should be ((956812-956563)/2)+956563 i.e. 956687.5 or 956687 and 956688 as a range.
I ran into exactly the same problem, reading the Anderson Lab blog and the original paper didn't help. I tried to write to the FANTOM, but didn't get any answer. If you solved this issue, I'd love to hear about it.
I am sorry but I still have no answer to it. I, however, learned how the enhancer center is decided I also wrote to them but I think they are big people so they are busy. You can read this paper to gain more insight on how the centre is decided. Hope the figure itself explains.
Also, if you want the ppt, it's somewhere in the additional file of the paper mentioned above. The name of the ppt would be 12864_2018_5016_MOESM15_ESM. We can discuss further if you require help on my email: rohitsatyam102@gmail.com
Please give a thumbs-up if it helped on the comment.
I read the official paper again and I think I realized ho they set the center.
While the start and end coordinates represent the exact CAGE-seq peaks (on opposite strands) the center is calculated as the midpoint between the most left coordinate of the first peak and the rightmost coordinated of the second peak.