Hi everyone,
I am working with Ensembles stable IDs for transcripts (ENSTs) and thought the idea of a stable ID was to point to identical transcripts on different versions of the database in terms of sequence (DNA and protein).
Now I found some ENSTs containing major changes in the sequence on different versions:
Examples:
SYNGAP1 ENST00000418600 - lost coding Exon, -58aa, -88bp, based on different Vega transcript:
ENST00000418600 Ensembl release 78
ENST00000418600 Ensembl release 81
CTDP1 ENST00000299543 - lost coding Exon, -119 aa, -373bp, different annotation method
ENST00000299543 Ensembl release 75
ENST00000299543 Ensembl release 81
So I have some questions arising from this:
1. What is guaranteed to stay stable for ensembles "stable" IDs (mainly ENSTs)?
All information I could find on this is:
2. Why is the community using ENSTs without version numbers (which exist), which would guarantee sequence stability (according to the documentation), while refseq NM_s are usually used with version? Examples:
CCDS, HGVS (recommending Variant annotation on ENST without version), Uniprot, ...
3. What stability do refseq stable IDs guarantee? Could you point me to any document defining the stable features I can assume for NM_ respectively versioned NM_?
Thanks for any help!
Edit: Reformatted the links and picked better database versions
Hi Jean-Karim,
I think you did not see, what I meant in my examples. Please correct me if I'm wrong.
I think you compared SYNGAP1-001 with SYNGAP1-001 in the two versions of Ensembl. Here you are correct. It looks like just a change in UTR-length (I didn't check for other change). I don't see a problem here, because a new ENST (ENST00000629380 version 79) was assigned in Version 79 (even for UTR change!).
The problem I see, is that the old ENST of SYNGAP1-001 (ENST00000418600 version78) lives on in SYNGAP1-010 (ENST00000418600 version79), which is 58 aa shorter, because of an alternative splice site in exon 18, which leads to a stop, leaving exon 19 untranslated. So there is a big change on exon and protein level! I would expect that to be called a significant change between annotations, so they should have different stable IDs, as you say.
I would say the same.
I did look at ENST00000418600 in both versions but in fact I now see what you mean. However, it looks as if ENST00000418600 version78 and ENST00000418600 version79 are still annotated as producing the same Uniprot entry (Q96PV0) although the Ensembl translation is different. I would ask the Ensembl helpdesk for clarification.
Hi Jean-Karim,
in Uniprot an entry can contain different isoforms of a protein. So it is common that many Ensembl transcripts with different protein sequence annotations map to the same entry.
I already asked the ensembl helpdesk, but got no helpfull answer yet.
I did not find any information or discussion about this and assumed it might be of general interest because of the importance of stable reference sequences.
Thought it would be an easy one for biostars.