There have been many studies reporting and annotating UTRs in S. cerevisiae. However, the standard databases such as SGD and ENSEMBL do not provide the complete transcript sequence; they just provide the ORF.
I also checked the folder called cDNA, in ENSEMBL. All the sequences start with ATG
and end with one of the stop codons (TAA
47%, TAG
23%, TGA
30%).
Is there a good reason for not reporting the complete sequence when that information is already available?
That was true for the pre-genomics era. Now it is so easy to upload transcriptomics data and it would only be an additional work to find ORFs and just upload them. ENSEMBL is supposed to have all the necessary setup for annotation, right? Yet they haven't included these features.