Hi, I want to get the 5'-UTR and 3'-UTR coodinates from the annotation file. I downloaded the "All GENCODE VM24" bed file from UCSC genome browser. The format of the bed file is following:
bin name chrom strand txStart txEnd
1023 ENSMUST00000000090.7 chr9 + 57521278 57532426
cdsStart cdsEnd exonCount
57521327 57531782 5
exonStarts exonEnds
57521278,57528955,57530240,57531668,57532281, 57521415,57529072,57530362,57531792,57532426,
score name2 cdsStartStat cdsEndStat exonFrames
0 Cox5a cmpl cmpl 0,1,1,0,-1,
For 5'UTR, its length is cdsStart - txStart = 57521327 - 57521278 = 49
, from (57521278, 57521327 ]
For 3'UTR, its length is txEnd - cdsEnd = 57532426 - 57531782 = 644
, from (57531782, 57532426 ]
However, the length of all exons (the trs) is only 645.
https://useast.ensembl.org/Mus_musculus/Transcript/Sequence_cDNA?db=core;g=ENSMUSG00000000088;r=9:57521279-57532426;t=ENSMUST00000000090
Is there anything wrong?
In addition, I find the 5'UTR and 3'UTR length are 1 for some transcript. Is it reasonable? In the link A: Easy Way To Get 3' Utr Lengths Of A List Of Genes, the 3'UTR length of OR4F5 is 0 and 1.
5utr ensembl_transcript_id
A ENSMUST00000054837
3utr ensembl_transcript_id
G ENSMUST00000073261
Thanks very much. The real 3'UTR length is 57531792 - 57531782 + (57532426 - 57532281) = 155. Thanks for the explaination of non-coding RNA and the UTR length. It's very clear.