You could use Python mygene
with the ensembl.transcript
scope:
#!/usr/bin/env python
import sys
import mygene
mg = mygene.MyGeneInfo()
names = []
for line in sys.stdin:
names.append(line.rstrip())
for name in names:
result = mg.query(name, scopes="ensembl.transcript", fields=["symbol"], species="mouse", verbose=False)
ensembl_name = name
for hit in result["hits"]:
if "symbol" in hit:
sys.stdout.write("%s\t%s\n" % (ensembl_name, hit["symbol"]))
Given a text file like names.txt
:
ENSMUST00000143813
ENSMUST00000099042
ENSMUST00000073363
You could run this script like so:
$ ./map_ensembl_transcripts_to_hgnc_symbols_mm10.py < names.txt
ENSMUST00000143813 0610009L18Rik
ENSMUST00000099042 Gm10717
ENSMUST00000073363 Amtn
Installation instructions: https://pypi.org/project/mygene/
Full listing of scopes/fields here: http://docs.mygene.info/en/latest/doc/query_service.html#available-fields
Edit: If you have a lot of genes/transcripts/etc. in names.txt
, then you may instead want to use querymany()
, which queries all genes/transcripts/etc. at once, instead of running one query()
call per gene/transcript/etc.:
#!/usr/bin/env python
import sys
import mygene
mg = mygene.MyGeneInfo()
names = []
for line in sys.stdin:
names.append(line.rstrip())
results = mg.querymany(names, scopes='ensembl.transcript', fields='symbol', species='mouse', verbose=False)
for res in results:
if 'symbol' in res:
sys.stdout.write("%s\t%s\n" % (res['query'], res['symbol']))
This is the mouse genome so you should be able to use the answer posted a couple of days back here : A: How to add gene symbol to RNA-Seq data using R