Question

Gene Id to Ensembl ID Conversion with LONG list

0

Entering edit mode

7.1 years ago

virlana.shchuka • 0

Hello everyone,

Is there an available resource that converts long lists of gene names to Ensembl IDs?

I CANNOT use Biomart, because the advised limit is 500 genes, and I have several lists of >6000 gene names each, and I cannot use DAVID because there is no input option that allows for regular gene names.

Thanks in advance!

Sincerely, Virlana.

gene • 5.2k views

ADD COMMENT • link updated 7.1 years ago by BioinfGuru ★ 2.1k • written 7.1 years ago by virlana.shchuka • 0

score 2 · Answer 1 · 2018-05-30

If you have the mygene library installed in Python, you could use the following Python script:

#!/usr/bin/env python

import sys
import mygene

mg = mygene.MyGeneInfo()

genes = []
for line in sys.stdin:
    genes.append(line.strip())

for gene in genes:
    result = mg.query(gene, scopes="symbol", fields=["ensembl"], species="human", verbose=False)
    hgnc_name = gene
    for hit in result["hits"]:
        if "ensembl" in hit and "gene" in hit["ensembl"]:
            sys.stdout.write("%s\t%s\n" % (hgnc_name, hit["ensembl"]["gene"]))

If you don't have mygene installed and you want to install it, you could run the following:

$ pip install mygene

As an example, here are HGNC names of genes in a file called "hgnc.txt":

DDX26B
CCDC83
MAST3
RPL11
ZDHHC20
LUC7L3
SNORD49A
CTSH
ACOT8

The above script would give the following output:

$ ./map_hgnc_to_ensg.py < hgnc.txt
DDX26B  ENSG00000225235
DDX26B  ENSG00000165359
CCDC83  ENSG00000150676
MAST3   ENSG00000099308
RPL11   ENSG00000142676
ZDHHC20 ENSG00000180776
ZDHHC20 ENSG00000236953
LUC7L3  ENSG00000108848
SNORD49A        ENSG00000277370
CTSH    ENSG00000103811
ACOT8   ENSG00000101473

You could write the output to a text file like so:

$ ./map_hgnc_to_ensg.py < hgnc.txt > hgnc_mapped_to_ensg.txt

score 2 · Answer 2 · 2018-05-30

2

Entering edit mode

7.1 years ago

Denise CS ★ 5.2k

There are other ways to use BioMart beyond its web user interface: BiomaRt, Bioconductor R package, BioMart Perl API and BioMart RESTful access.

ADD COMMENT • link 7.1 years ago by Denise CS ★ 5.2k

score 0 · Answer 3 · 2018-05-30

0

Entering edit mode

7.1 years ago

cpad0112 21k

convert human gene names to ensembl ID

ADD COMMENT • link 7.1 years ago by cpad0112 21k

0

Entering edit mode

Asker has noted that they cannot use Biomart.

ADD REPLY • link 7.1 years ago by Alex Reynolds 36k

0

Entering edit mode

well, in that link there are around 4 ways in addition to link to most comprehensive post on ID conversion https://www.biostars.org/p/22/:

python
R - 3 different ways (using mygene, ensembldb, pathview)
User developed tool
Biomart.

I guess even if biomart is excluded, there are still 5 ways left including 3 methods from R.

ADD REPLY • link 7.1 years ago by cpad0112 21k