Question

Retrieve gene names from gene symbols using Entrez?

0

Entering edit mode

8.5 years ago

randalljellis ▴ 90

Hello,

Is it possible to use any of the Entrez tools to query a gene symbol and retrieve the gene name? As in, query "DRD1" and retrieve "dopamine receptor D1" as a result. If it can't be done with Entrez but can be some other way, I would gladly follow that too!

Thank you in advance, you're all BioStars :)

entrez biopython gene symbol gene • 2.4k views

ADD COMMENT • link 8.5 years ago by randalljellis ▴ 90

2

Entering edit mode

8.5 years ago

randalljellis ▴ 90

Thanks guys! Here's the program I ended up writing in case anyone wants to do the same thing!

def gene_alias(gene_file):
from Bio import Entrez
import csv

Entrez.email = "your email"
genes = [gene.rstrip('\n') for gene in open(gene_file)]

ids = []
aliases = []

for gene in genes:
    #retrieve gene ID
    handle = Entrez.esearch(db="gene", term="Mus musculus[Orgn] AND " + gene + "[Gene]")
    record = Entrez.read(handle)

    if len(record["IdList"]) > 0:
        ids.append(record["IdList"][0])

        #retrieve aliases
        record_with_aliases = Entrez.efetch(db="gene",id=record["IdList"][0],retmode="json")
        entry = record_with_aliases.read()
        entry_lines = entry.splitlines()
        for i in range(len(entry_lines)):
            while 'This record was replaced with GeneID:' in entry_lines[i]:
               new_id = entry_lines[i][38:]
               record_with_aliases = Entrez.efetch(db="gene",id=new_id ,retmode="json")
               entry = record_with_aliases.read()
               entry_lines = entry.splitlines()


        firstline = entry.splitlines()[1]
        if gene.lower() == firstline[3:].lower():
            thirdline = entry.splitlines()[3]
            fourthline = entry.splitlines()[4]
            if thirdline[0:13] == 'Other Aliases':
                aliases.append(thirdline[15:])
            elif fourthline == 'This record was discontinued.':
                aliases.append(fourthline)
            else:
                aliases.append('no aliases')
        else:
            aliases.append(firstline[3:])

    else:
        ids.append(gene + ' is not in Gene')
        aliases.append(gene + ' is not in Gene')

rows = zip(genes, ids, aliases)
with open('gene_aliases.csv', 'wb') as thefile:
    writer = csv.writer(thefile)
    writer.writerow(['Gene', 'ID', 'Aliases'])
    for row in rows:
        writer.writerow(row)

ADD COMMENT • link 8.5 years ago by randalljellis ▴ 90

score 2 · Accepted Answer · 2016-06-28

2

Entering edit mode

8.5 years ago

WouterDeCoster 47k

As very very often, the answer is probably Ensembl's biomart :)

ADD COMMENT • link 8.5 years ago by WouterDeCoster 47k

score 2 · Accepted Answer · 2016-06-28

2

Entering edit mode

8.5 years ago

GenoMax 148k

It can probably be done using eUtils or by parsing the correct gene name (there are more than one DRD1 genes) from this file.

ADD COMMENT • link 8.5 years ago by GenoMax 148k

score 2 · Accepted Answer · 2016-06-28

2

Entering edit mode

8.5 years ago

Ming Tommy Tang ★ 4.5k

I have blog posts on this: http://crazyhottommy.blogspot.com/2014/09/converting-gene-ids-using-bioconductor.html http://crazyhottommy.blogspot.com/2014/09/mapping-gene-ids-with-mygene.html

ADD COMMENT • link 8.5 years ago by Ming Tommy Tang ★ 4.5k