I was working with the refseq genes hg19 file to define genomic regions and I downloaded 4 files from the UCSC table browser
with all exons
5'UTR exons
3'UTR exons
Coding exons
I realised that the no. of records in 2+3+4 are more than that in 1. It doesnt strike me at the moment, why should this be the case. Shouldn't 5 UTR+3UTR+Coding exons add up to total no. of exons in the genome ? Or am I missing something obvious?
One of possibility is that the UTR can be represented in the multiple exons. So the some of exons in coding exons file may have UTRs embedded in the coding exons You need to find coordinates of UTRs and check it, whether is is represented in coding exons or not.
ADD COMMENT
• link
updated 3.7 years ago by
Ram
44k
•
written 10.1 years ago by
Renesh
★
2.2k
2
Entering edit mode
To add to this there's also noncoding RNA in refseq, which have a 5'UTR sequence and no coding exons.
To add to this there's also noncoding RNA in refseq, which have a 5'UTR sequence and no coding exons.
I can guarantee that this is the case. UTRs will commonly share exons with coding sequence.