Ensembl Biomart : One Ensembl transcript ID for several UniprotKB ID
1
0
Entering edit mode
5.7 years ago
Chat-ra ▴ 40

Hello everyone!

I want to map Ensembl IDs to UniprotKB IDs in order to do that I’ve downloaded a file from the Ensembl Biomart here :

https://www.ensembl.org/biomart/martview/4cab87e631cda01b9def14f09cc2021f

The chosen database is: “Ensembl Genes 95” and the chosen dataset is “Human Genes (GRCh38.p12)”

The chosen attributes are: Gene stable ID Transcript stable ID Protein stable ID Gene name UniprotKB Gene Name ID

Here is my problem for some of the Transcript stable ID I’ve found several UniprotKB ID but on the web version of Ensembl I find just one UniprotKB ID.

For example: For the transcript ENST00000256186 in the downloaded file I find all these results:

ENSG00000133816, ENST00000256186, ENSP00000256186, MICAL2, E9PKW5

ENSG00000133816, ENST00000256186, ENSP00000256186, MICAL2, E9PKI3

ENSG00000133816, ENST00000256186, ENSP00000256186, MICAL2, E9PNC3

ENSG00000133816, ENST00000256186, ENSP00000256186, MICAL2, E9PL42

ENSG00000133816, ENST00000256186, ENSP00000256186, MICAL2, A0A2R8YFA9

ENSG00000133816, ENST00000256186, ENSP00000256186, MICAL2, E9PJB0

ENSG00000133816, ENST00000256186, ENSP00000256186, MICAL2, E9PRE0

ENSG00000133816, ENST00000256186, ENSP00000256186, MICAL2, Q6ZW33

ENSG00000133816, ENST00000256186, ENSP00000256186, MICAL2, O94851

But on the web site: https://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000133816;r=11:12094008-12359144

For the transcript ENST00000256186 there is only one result: Q6ZW33.

Is there someone knows why I got different results?

Thank you for any suggestions!!!

genome • 2.5k views
ADD COMMENT
2
Entering edit mode

This gene has multiple transcripts and Uniprot seems to assign them with unique ID's.

@Emily_Ensembl had noted that the annotation philosophies for these two databases are different in a previous comment for a different human gene (C: Ensembl ID conversion - other than Uniprot ).

ADD REPLY
2
Entering edit mode
5.7 years ago
Denise CS ★ 5.2k

The table you point to in the Ensembl Browser has the protein column called UniProt, which is protein IDs from UniProtKB that match one of the translations of the Ensembl gene/transcript.

This means that that column can show either SwissProt IDs (e.g. O94851 and Q6ZW33) or TrEMBL IDs (e.g. E9PNC3 and E9PKI3). You will only find if the ID is SwissProt or TrEMBL if you click on it and explore the UniProt website.

In your BioMart query, I'd have picked both UniProtKB/SwissProt ID and UniProtKB/TrEMBL ID as attributes instead of the one you've chosen i.e. UniProtKB Gene Name ID.

I'm not sure what is UniProtKB Gene Name ID. All I know it's confusing as you've seen yourself: lots, lots, lots of different UniProt Gene Name IDs for one transcript. For ENST00000256186, there are 9 UniProtKB Gene Name IDs:

E9PKW5 E9PKI3 E9PNC3 E9PL42 A0A2R8YFA9 E9PJB0 E9PRE0 Q6ZW33 O94851

If you search for each of those in UniProt, you will see that most are unreviewed sequences aka TrEMBL, apart from 2, which are reviewed entries aka SwissProt i.e. Q6ZW33 and O94851.

Also, note there seems to be a problem with Q6ZW33 being called as MICALCL not MICAL2 in UniProt, although the Ensembl Gene ID is the same for these two names. Worth reporting this back to UniProt.

ADD COMMENT
1
Entering edit mode

Okey thanks for your response!! I will change the attributes to UniProtKB/SwissProt ID and UniProtKB/TrEMBL ID!

The MICALCL/MICAL2 case is confusing me because on GeneCards MICAL2 is listed as an alias for MICALCL ( https://www.genecards.org/cgi-bin/carddisp.pl?gene=MICALCL&keywords=MICAL2 ). But they both have separate NCBI page with each separate Entrez Gene ID (9645 for MICAL2 and 84953 for MICALCL) but seems to have the same ensemble gene ID. I will try to report it to Uniprot.

EDIT : I've downloaded the Biomart with the attributes UniProtKB/SwissProt ID and UniProtKB/TrEMBL and I find the same results that on the Ensembl Browser ! Thanks a lot !

ADD REPLY
1
Entering edit mode

Please please do get in touch with HGNC (Human gene nomenclature committee) and Ensembl helpdesk regarding this.

HGNC has MICALCL and MICAL2 as two different loci. Maybe this is historical, but all those resources should agree on the official gene symbol and correct the mapping to external references. For example HGNC maps MICALCL to ENSG00000133808, now deprecated.

Entrez gene has two different gene IDs for MICAL2 and MICALCL but both maps to ENSG00000133816.

NCBI and EMBL-EBI are working together on a new project, MANE (The Matched Annotation from the NCBI and EBI). Please email their helpdesk so that they can advise on this better and organise the entries really match.

ADD REPLY
1
Entering edit mode

In case someone has the same question here is the response from the HGNC help desk :

"Currently NCBI represent MICAL2 and MICALCL as two separate adjacent genes, see https://www.ncbi.nlm.nih.gov/gene/9645 and https://www.ncbi.nlm.nih.gov/gene/84953

However the GENCODE annotation shown in Ensembl has recently merged these loci into one gene (ENSG00000133816) which they have retained the MICAL2 symbol for.

So the two annotation resources differ in their annotation of this region - one has it as two separate genes, while the other has one long gene. Hence until this discrepancy has been resolved we are retaining both MICAL2 and MICALCL as separate genes."

ADD REPLY

Login before adding your answer.

Traffic: 1871 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6