Dear all,
I am looking for the annotations of 5'UTR and 3'UTR of the tilapia genome. I went to UCSC table browser to try to download the BED files of these UTRs. I tried track:Refseq Genes, table:refGene and downloaded the BED file. The number of annotated UTRs were very little (about 5 per chromosome). Hence I believe the genome sequencing team did not annotate the UTRs.
Then I move on to try track:Refseq Genes, table:xenoRefGene and downloaded the BED files. This time around, the table browser gave me a full list of genes annotated on the reference genome but the UTRs were all 200bps from the start codon. All 5' UTRs were 200bp in length.
- I'm a little confused now, can this datasets be used? If no 5'UTR were annotated, can we assume that 200bp up-stream of the start codon can be arbitrarily defined as the 5'UTR vice-versa for the 3'UTR?
- If this can't be used, how should I go about it if I am keen to know if my sequencing hits of interests are in the UTR region?
regards,
kenta
Do you mean by going to the BioMart section to download the UTR annotations?
Of just download the GTF and filter it as needed. Either way would work.
I have checked. The tilapia UTRs are not annotated in the Ensembl database. I wonder how bioinformaticians annotate UTRs? Is it possible to use, say zebrafish transcriptome to annotate the Nile tilapia genome as well as their UTRs?
They're in the GTF file that I downloaded, so just check there. You can find some details on Ensembls annotation process for this species here.