Hi,
I was working with the refseq genes hg19 file to define genomic regions and I downloaded 4 files from the UCSC table browser
- with all exons
- 5'UTR exons
- 3'UTR exons
- Coding exons
I realised that the no. of records in 2+3+4 are more than that in 1. It doesnt strike me at the moment, why should this be the case. Shouldn't 5 UTR+3UTR+Coding exons add up to total no. of exons in the genome ? Or am I missing something obvious?
Thanks!
Aditi
To add to this there's also noncoding RNA in refseq, which have a 5'UTR sequence and no coding exons.
I can guarantee that this is the case. UTRs will commonly share exons with coding sequence.