Convert Ensembl Transcript Ids Ensmust To Gene Symbol Using Mygene Module In Python
5
2
Entering edit mode
10.8 years ago

I have a list of ensembl transcript IDs

ens = ['ENSMUST00000114694', 'ENSMUST00000028279', 'ENSMUST00000051301', 'ENSMUST00000038053', 'ENSMUST00000078988', 'ENSMUST00000115314', 'ENSMUST00000041124', 'ENSMUST00000022696', 'ENSMUST00000067689', 'ENSMUST00000162505', 'ENSMUST00000141755', 'ENSMUST00000111427', 'ENSMUST00000042868', 'ENSMUST00000160583', 'ENSMUST00000037182', 'ENSMUST00000131186']

I've downloaded and install mygene

import mygene

how do I convert these to the official gene symbol?

python ensembl • 24k views
ADD COMMENT
5
Entering edit mode
8.4 years ago
Newgene ▴ 370

It's easy to do that using mygene Python module:

import mygene
mg = mygene.MyGeneInfo()
mg.querymany(ens, scopes='ensembl.transcript')

That's it!

ADD COMMENT
1
Entering edit mode

Thanks... This works great.

import mygene
mg = mygene.MyGeneInfo()
ens = ['ENSG00000148795', 'ENSG00000165359', 'ENSG00000150676']
ginfo = mg.querymany(ens, scopes='ensembl.gene')

for g in ginfo:
 for k, v in g.iteritems():
  print "- {0: <10}: {1}".format(k, v)
 print 

### OUTPUT ###

- name      : cytochrome P450 family 17 subfamily A member 1
- symbol    : CYP17A1
- taxid     : 9606
- entrezgene: 1586
- query     : ENSG00000148795
- _id       : 1586

- name      : integrator complex subunit 6 like
- symbol    : INTS6L
- taxid     : 9606
- entrezgene: 203522
- query     : ENSG00000165359
- _id       : 203522

- name      : coiled-coil domain containing 83
- symbol    : CCDC83
- taxid     : 9606
- entrezgene: 220047
- query     : ENSG00000150676
- _id       : 220047
ADD REPLY
1
Entering edit mode
10.8 years ago
GANI ▴ 230

I don't think you can use Ensembl Transcript IDs with the mygene module.However you can use it with Ensembl Gene IDs. Following snippet shows a sample using Human gene identifiers

import mygene
mg = mygene.MyGeneInfo()
geneList = ['ENSG00000148795', 'ENSG00000165359', 'ENSG00000150676']
geneSyms = mg.querymany(geneList , scopes='ensembl.gene', fields='symbol', species='human')

Check the following Gist for more information

http://nbviewer.ipython.org/gist/newgene/6771106

ADD COMMENT
0
Entering edit mode

do you know of any way I can convert a list of transcript IDs into gene symbols ? I usually use biodbnet but my list is ~20k and it just freezes. biodb.jp isn't working for converting transcript IDs . I just need a way to convert these IDs .

ADD REPLY
1
Entering edit mode
10.8 years ago
Emily 24k

No idea how to work mygene. I would recommend using the Perl API, which you can learn about using this online course.

ADD COMMENT
0
Entering edit mode

I agree. I always prefer to go via the ensembl API.

Alternative methods could be to use biomart or PyCogent: http://pycogent.org/

ADD REPLY
1
Entering edit mode

BioMart won't work for 20,000 IDs. It'll break down partway through your query, without warning, only giving you a partial dataset.

ADD REPLY
0
Entering edit mode
22 months ago
Francesco • 0

Hello, for who needs a module in python I developed this library for who is looking for a python lightweight gene-id library conversion tool.

ADD COMMENT
0
Entering edit mode
16 months ago
LayneSadler ▴ 90

Within the same ecosystem as mygene

from biothings_client import get_client
gene_client = get_client('gene')

gene_client.getgene('ENSG00000237801', fields='symbol')
"""
{'_id': 'ENSG00000237801', '_version': 1, 'symbol': 'AMD1P2'}
"""


gene_client.getgenes(['ENSG00000237801', 'ENSG00000210195'], fields='symbol')
"""
[
    {'query': 'ENSG00000237801', '_id': 'ENSG00000237801','_version': 1,'symbol': 'AMD1P2'},
    {'query': 'ENSG00000210195', '_id': '4576', '_version': 1, 'symbol': 'TRNT'}
]
"""
ADD COMMENT

Login before adding your answer.

Traffic: 1669 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6