TERT isoform discrepancies UCSC vs Ensembl
0
1
Entering edit mode
8.1 years ago

Hi,

I'm curious why the annotation for TERT isoforms differs so much between UCSC and Ensembl. Ensembl isoforms matches those in UniProt but UCSC's are very different. Any idea why this is the case?

Ensembl: http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000164362;r=5:1253147-1295069

UCSC: Not sure how to link to this, search for TERT and click the individual isoforms in the genome browser

UniProt: http://www.uniprot.org/uniprot/O14746#sequences

Floris

gene isoform ensembl ucsc uniprot • 2.4k views
ADD COMMENT
1
Entering edit mode

Here's UCSC: http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr5%3A1253147-1295069&hgsid=482326585_Pd9u1fAykS8a9ScOggz7O8XDbDz5

Which UCSC tracks are you referring to and can you tell me what you think is particularly different? The structures look pretty similar to me.

ADD REPLY
0
Entering edit mode

So according to UCSC hg19 there are ten TERT isoforms

22396   uc003jbz.1  2586bp
22396   uc003jca.1 3977bp
22396   uc003jcb.1 4018bp <-- main canonical isoform
22396   uc003jcc.1 3829bp
22396   uc003jcd.1 3606bp
22396   uc003jce.1 3642bp
22396   uc021xwa.1 321bp
22396   uc021xwb.1 402bp
22396   uc021xwc.1 966bp
22396   uc021xvz.1  990bp

According to Ensembl GRCh37 there are eight isoforms

Name    Transcript ID   bp  Protein
TERT-001    ENST00000310581 4018    1132aa <-- main canonical isoform
TERT-005    ENST00000334602 3210    1069aa
TERT-201    ENST00000296820 3829    807aa
TERT-006    ENST00000508104 2486    807aa
TERT-004    ENST00000460137 2992    795aa
TERT-008    ENST00000522877 408 No protein
TERT-003    ENST00000484238 2422    No protein
TERT-007    ENST00000503656 406 No protein

Only the canonical TERT is given in both annotations, all the others seem to differ from each other

ADD REPLY
0
Entering edit mode

Furthermore, four transcripts in Ensembl are marked as protein coding, and match proteins described in UniProt

Isoform 1 (identifier: O14746-1)
This isoform has been chosen as the 'canonical' sequence. All positional information in this entry refers to it. This is also the sequence that appears in the downloadable versions of the entry.
Length:1,132

Isoform 2 (identifier: O14746-2)
The sequence of this isoform differs from the canonical sequence as follows:
     764-807: STLTDLQPYM...LNEASSGLFD → LRPVPGDPAG...AGRAAPAFGG
     808-1132: Missing.
Length:807


Isoform 3 (identifier: O14746-3) [UniParc]FASTAAdd to basket
The sequence of this isoform differs from the canonical sequence as follows:
     885-947: Missing.
Note: May be produced at very low levels due to a premature stop codon in the mRNA, leading to nonsense-mediated mRNA decay. No experimental confirmation available.
Length:1,069


Isoform 4 (identifier: O14746-4) [UniParc]FASTAAdd to basket
The sequence of this isoform differs from the canonical sequence as follows:
     711-722: Missing.
     764-807: STLTDLQPYM...LNEASSGLFD → LRPVPGDPAG...AGRAAPAFGG
     808-1132: Missing.
Length:795
ADD REPLY
1
Entering edit mode

I wouldn't trust much alternative isoforms. They are usually non evolutionary conserved and lack evidence at the protein level. On top of that, their definition varies from one dataset to another (Ensembl vs UCSC)

ADD REPLY
0
Entering edit mode

@Emily: Your link is going to GRCh38 genome build.

Since @floris is referring to GRCh37 this would be the link to use instead.

ADD REPLY
1
Entering edit mode

Simple explanation here may be the differences between how UCSC "genes" are built compared to Ensembl. Their method is described in the track description here.

ADD REPLY

Login before adding your answer.

Traffic: 2044 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6