Ambiguous fields in FANTOM 5 Enhancer_TSS_association.bed file
1
0
Entering edit mode
4.9 years ago

Hi all

I downloaded a file from enhancer database (Slidebase) named "enhancer tss associations". However, I am facing a problem in identifying what does coordinates in the first three columns of this bed file represents. I am aware that the fourth column containing entries like "chr1:167440766-167441089; NM_052862; RCSD1; R:0.319; FDR:0" means enhancer coordinates,Transcript accession number,gene symbol,some_score, and False Discovery rate. I am not sure what kind of score does R score represents. I went through the paper of andersson et al to which the website points. However, I couldn't find anything. Also, the last two columns don't make sense to me.

RNA-Seq gene genome • 1.3k views
ADD COMMENT
2
Entering edit mode
4.8 years ago
Corentin ▴ 610

Hi,

The file is in the BED12 format: http://genome.ucsc.edu/FAQ/FAQformat.html#format1 . This format is used to display tracks on a Genome Browser.

The last two columns represents where blocks are drawn on the Genome Browser. In my understanding, one block represents the enhancer and the other the TSS. One of the column represents the length of each block and the other column represents the start of each block (compared to the position on the chromosome, the second column).

You can see an example of the two blocks here (notice how the line name correspond to the 4th column of your file):

http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr1%3A858252%2D861621&hgsid=791359097_TnfoVJubF5SaAM0recpdIpTpvsGI

The R score (calculated as a Pearson Correlation Score) represents the strength of the association between an enhancer and a tss site, if it is higher, then the association is stronger. As you can see, the higher the R score is, the higher the "score column" is (this is because, the "score column" is used to draw the blocks in different shades of grey).

For more information you can also read the FANTOM5 paper: https://www.nature.com/articles/nature12787

ADD COMMENT
0
Entering edit mode

Thanks, Corentin for your explanation. However, it is still unclear to me what does first three columns represent in the bed file. They aren't the coordinates for the enhancers that I am sure of. I wish to understand what the start and end coordinates refer to in this case. They aren't TSS either.

ADD REPLY
0
Entering edit mode

I did some testing on the UCSC genome browser and it seems that the first three columns correspond to the whole feature (the enhancer + TSS). It is probably to make the genome browser display everything.

The coordinates does not exactly match the features (it seems to start before and end after the actual enhancer and TSS, which is probably to make the view better?).

But since I have not found a documentation for it, I am not 100% sure. Let us know if you manage to find an answer.

ADD REPLY

Login before adding your answer.

Traffic: 2531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6