How to map old Ensembl Gene IDs to HGNC symbols and Entrez IDs
1
0
Entering edit mode
3.3 years ago

I have a list of Ensembl version 74 gene IDs that I need to convert to HGNC symbols. I was wondering what the best way to go about this would be? Originally I used biomaRt with Ensembl 75 (74 wasn't available) to perform the mapping, however, several of the symbols in the produced mapping were no longer the official HGNC symbols for the genes but aliases. I was wondering if I should use the current version of Ensembl instead, or maybe use both (current version for IDs that are still in that version, 75 for those that aren't)? Also, if I wanted to convert the Ensembl IDs to Entrez IDs, what would be the best way to do it?

mapping Ensembl • 3.0k views
ADD COMMENT
0
Entering edit mode

Ensembl IDs to Entrez ID

Use: Ensembl ID to ENTREZ best converter

ADD REPLY
0
Entering edit mode

I am currently using biomaRt to convert the IDs, I'm just not sure if I should use an archived version (75) of Ensembl or not

ADD REPLY
1
Entering edit mode
3.3 years ago

The most accurate way would be to take the sequences corresponding to each source ID and map them to the sequences in the target resource(s). This way you avoid issues with versions. Otherwise, use the xrefs from Ensembl v74 which is still available on the Ensembl FTP site. You can recreate the database locally and use the perl API to interact with it. Note that you would still get outdated HGNC and Entrez IDs. You would then need to follow whatever historical ID mapping is provided (if any) in these resources.

ADD COMMENT
0
Entering edit mode

BioMart is available on version 75 of Ensembl site. It is not clear if OP has used it and finds that several symbols returned are no longer official HGNC ones.

ADD REPLY
0
Entering edit mode

I have used it but unfortunately did find that several of the symbols returned were no longer the official HGNC ones :(

ADD REPLY
0
Entering edit mode

Ok, thanks, I'll try that out! Also, out of curiosity, I used the Ensembl ID history converter to figure out if any of the Ensembl 74 IDs I had that were not in 104, mapped to a different ID in 104. None did. If I was only interested in genes that were still in the 104 release (since I'm going to be performing GSEA, trust the newest Ensembl annotation the most, and therefore don't want to include genes that are most likely errors), would it be okay to just discard those genes, and use biomaRt with Ensembl 104 to map the others?

ADD REPLY
0
Entering edit mode

I haven't used the ID history tool in a long while but in the past I had mixed results with it which is why I advocate mapping sequences. It's probably OK to discard old genes based on ID history if this doesn't throw away too many.

ADD REPLY

Login before adding your answer.

Traffic: 2839 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6