Hello, I have one question about a gene annotation I downloaded recently in gff3 format. Below is an abbreviated example containing the first few lines of this file:
##gff-version 3
# Generated on Tue Nov 27 19:25:49 2012
# UCSC table file ./ucsc_tables/hg19/ensGene.txt
chr1 ensGene gene 11869 14412 . + . Name=..
chr1 ensGene ncRNA 11869 14409 . + . Name=..
chr1 ensGene exon 11869 12227 . + . Name=..
chr1 ensGene exon 12613 12721 . + . Name=..
..
..
chr1 ensGene gene 14363 29806 . - . Name=..
chr1 ensGene ncRNA 14363 29370 . - . Name=..
chr1 ensGene exon 14363 14829 . - . Name=..
..
..
As shown above, for each gene, there is an arbitrary number of exons listed for it.
My question: Is it correct to assume, that the start and end coordinates of a listed gene represent the TSS and TTS?
I need these two properties to measure the distance to certain alternative splice events, which I have computed with MISO (unfortunately, the MISO output doesn't provide these two properties)
Best regards
My old question on this subject may help you, with adjustments for your genome of interest. I include some scripting to grab TSS coordinates from Ensembl GTF or via their Perl API. You will need to consider the strand the annotation is assigned to, to use that annotation coordinates to generate a useful TSS value.