Question

TSS / TTS in Ensembl gene annotation?

0

Entering edit mode

9.6 years ago

thefirstrealace ▴ 30

Hello, I have one question about a gene annotation I downloaded recently in gff3 format. Below is an abbreviated example containing the first few lines of this file:

##gff-version 3
# Generated on Tue Nov 27 19:25:49 2012
# UCSC table file ./ucsc_tables/hg19/ensGene.txt
chr1    ensGene    gene       11869    14412    .    +    .    Name=..
chr1    ensGene    ncRNA    11869    14409    .    +    .    Name=..
chr1    ensGene    exon       11869    12227    .    +    .    Name=..
chr1    ensGene    exon       12613    12721    .    +    .    Name=..
..
..
chr1    ensGene    gene       14363    29806    .    -    .    Name=..
chr1    ensGene    ncRNA    14363    29370    .    -    .    Name=..
chr1    ensGene    exon       14363    14829    .    -    .    Name=..
..
..

As shown above, for each gene, there is an arbitrary number of exons listed for it.

My question: Is it correct to assume, that the start and end coordinates of a listed gene represent the TSS and TTS?

I need these two properties to measure the distance to certain alternative splice events, which I have computed with MISO (unfortunately, the MISO output doesn't provide these two properties)

Best regards

ensembl gene-annotation gff • 13k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 9.6 years ago by thefirstrealace ▴ 30

1

Entering edit mode

My old question on this subject may help you, with adjustments for your genome of interest. I include some scripting to grab TSS coordinates from Ensembl GTF or via their Perl API. You will need to consider the strand the annotation is assigned to, to use that annotation coordinates to generate a useful TSS value.

ADD REPLY • link 9.6 years ago by Alex Reynolds 36k

score 6 · Accepted Answer · 2016-01-12

6

Entering edit mode

9.6 years ago

Emily 24k

The start coordinate of forward strand genes and the end coordinate of negative strand genes will represent the TSS of the most 5' transcript of the gene. Other transcripts of the gene will have different TSSs. To get all TSSs, you should use the cDNA features in the file.

ADD COMMENT • link 9.6 years ago by Emily 24k

0

Entering edit mode

Thank you very much for your help, for some reason I never considered the cDNA features in this file, but it actually makes perfect sense :)

ADD REPLY • link 9.6 years ago by thefirstrealace ▴ 30

0

Entering edit mode

Hi Emily, I have annotation as "gene", "transcript" and "exon". Should I consider TSS based on transcript start or gene start?