Get gene names from ensembl ID or gene region
3
0
Entering edit mode
6.3 years ago
L. A. Liggett ▴ 130

I have some RNASeq data that has fpkm values labeled with genome positions like chr7 52823165 52830546 Ensembl IDs like ENSMUST00000143813 and gene symbols like 0610005C13Rik. Is there a good programmatic way in python to get gene names from any of this data so I can match up the fpkm values with the actual altered genes?

RNA-Seq Ensembl Python • 6.2k views
ADD COMMENT
0
Entering edit mode

This is the mouse genome so you should be able to use the answer posted a couple of days back here : A: How to add gene symbol to RNA-Seq data using R

ADD REPLY
4
Entering edit mode
6.3 years ago

You could use Python mygene with the ensembl.transcript scope:

#!/usr/bin/env python

import sys
import mygene

mg = mygene.MyGeneInfo()

names = []
for line in sys.stdin:
    names.append(line.rstrip())

for name in names:
    result = mg.query(name, scopes="ensembl.transcript", fields=["symbol"], species="mouse", verbose=False)
    ensembl_name = name
    for hit in result["hits"]:
        if "symbol" in hit:
            sys.stdout.write("%s\t%s\n" % (ensembl_name, hit["symbol"]))

Given a text file like names.txt:

ENSMUST00000143813
ENSMUST00000099042
ENSMUST00000073363

You could run this script like so:

$ ./map_ensembl_transcripts_to_hgnc_symbols_mm10.py < names.txt
ENSMUST00000143813      0610009L18Rik
ENSMUST00000099042      Gm10717
ENSMUST00000073363      Amtn

Installation instructions: https://pypi.org/project/mygene/

Full listing of scopes/fields here: http://docs.mygene.info/en/latest/doc/query_service.html#available-fields

Edit: If you have a lot of genes/transcripts/etc. in names.txt, then you may instead want to use querymany(), which queries all genes/transcripts/etc. at once, instead of running one query() call per gene/transcript/etc.:

#!/usr/bin/env python

import sys
import mygene

mg = mygene.MyGeneInfo()

names = []
for line in sys.stdin:
    names.append(line.rstrip())

results = mg.querymany(names, scopes='ensembl.transcript', fields='symbol', species='mouse', verbose=False)
for res in results:
    if 'symbol' in res:
        sys.stdout.write("%s\t%s\n" % (res['query'], res['symbol']))
ADD COMMENT
2
Entering edit mode
6.3 years ago
cilgaiscan ▴ 60

Hey! I use biomaRt in R. Here is my code for it. Hope it will help :

library("biomaRt")  
differentialexpression <- read.csv("put your file's path here", sep = ",",header = T)
ensembl = useMart("ensembl", dataset="mmusculus_gene_ensembl")
values<- differentialexpression$columnnameensembl 
data <- getBM(attributes=c("ensembl_gene_id", "hgnc_symbol"), filters = "ensembl_gene_id", values = values, mart= ensembl)
ADD COMMENT
1
Entering edit mode
6.3 years ago
Eric Lim ★ 2.2k

Their REST is probably the quickest, especially for single ID/gene lookup

https://rest.ensembl.org/documentation/info/lookup

https://rest.ensembl.org/documentation/

ADD COMMENT

Login before adding your answer.

Traffic: 2059 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6