Programatic Technique For Gene-Name/Id Conversion
5
6
Entering edit mode
14.7 years ago
Will 4.6k

Does anyone know of a good gene-id conversion tool written in Python. I've come across numerous webtools but I'd like something a little more automated. I have the knowledge/ability to do it myself I was just wondering if there was something already out there. There's no point in re-inventing the wheel each time.

Thanks in advance

python conversion • 9.8k views
ADD COMMENT
0
Entering edit mode

Which Ids do you want to convert, exactly? I think there is not a single service that can convert between all the most used ids for genes, even biomart and uniprot lack some databases. Look at this question.

ADD REPLY
9
Entering edit mode
14.7 years ago

Writing a small tool to automate access to the website/service is pretty simple. Here's a method I wrote for the UniProt ID mapping service:

import urllib
import urllib2

def uniprot_mapping(fromtype, totype, identifier):
    base = 'http://www.uniprot.org'
    tool = 'mapping'
    params = {'from':fromtype,
                'to':totype,
                'format':'tab',
                'query':identifier,
    }
    data = urllib.urlencode(params)
    url = base+'/'+tool+'?'+data
    response = urllib2.urlopen(url)
    return response.read()

It's not extensively tested, but should work. You can find a list of fromtypes and totypes here: http://www.uniprot.org/faq/28#id_mapping_examples

ADD COMMENT
0
Entering edit mode

Oh and also, Tim Yates has written a Groovy version of my code, above: http://gist.github.com/330312

ADD REPLY
0
Entering edit mode

I should note that DAVID has an API: http://david.abcc.ncifcrf.gov/content.jsp?file=DAVID_API.html but it doesn't seem to cover their ID mapping service.

ADD REPLY
3
Entering edit mode
11.1 years ago
Andrew Su 4.9k

Check out the python library for mygene.info. For example:

In [1]: import mygene

In [2]: mg = mygene.MyGeneInfo()

In [3]: mg.getgene(1017)
Out[3]:
{'_id': '1017',
 'entrezgene': 1017,
 'name': 'cyclin-dependent kinase 2',
 'symbol': 'CDK2',
 'taxid': 9606}

In [4]:  mg.query('cdk2', size=2)
Out[4]:
{'hits': [{'_id': '1017',
   '_score': 373.24667,
   'entrezgene': 1017,
   'name': 'cyclin-dependent kinase 2',
   'symbol': 'CDK2',
   'taxid': 9606},
  {'_id': '12566',
   '_score': 353.90176,
   'entrezgene': 12566,
   'name': 'cyclin-dependent kinase 2',
   'symbol': 'Cdk2',
   'taxid': 10090}],
 'max_score': 373.24667,
 'took': 10,
 'total': 28}

A few nice features of mygene.info

  • automated updates (minimum weekly) from NCBI Entrez, Ensembl, UniProt, NetAffy, and PharmGKB
  • can be easily configured to incorporate additional gene-centric resources, so ask if you see something missing
  • the API is fast and can handle lots of concurrency, so hit it as hard as you want (up to, say, 5 queries per second)

More information in this publication and on these blog posts.

ADD COMMENT
2
Entering edit mode
14.7 years ago
Will 4.6k

In case anyone comes by this later I've made a simple python module for doing this sort of converting. You can find it on GitHub: http://github.com/JudoWill/IDConverter

Feel free make comments and provide suggestions.

ADD COMMENT
0
Entering edit mode

the link is broken:/ UPDATE: okay, now I see the date of this post.

ADD REPLY
0
Entering edit mode
14.7 years ago

You could automate the access to the website with Python ;-)

ADD COMMENT
0
Entering edit mode
11.1 years ago
Prakki Rama ★ 2.7k

Have you come across bioDBnet?

ADD COMMENT

Login before adding your answer.

Traffic: 2672 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6