How to map UCSC transcripts to gene symbol?
2
0
Entering edit mode
6.3 years ago
wenbinm ▴ 40

Hi there,

I would like to map UCSC transcripts id (mouse genome mm10, I downloaded the transcripts id from http://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/refMrna.fa.gz) to gene symbol. I have a list of transcript id like 'NR_046233 2' and want to get a list of corresponding gene symbols.

Dose anyone know how to map each transcript id to gene symbol?

Thank you!

genome rna-seq assembly • 4.9k views
ADD COMMENT
1
Entering edit mode
6.3 years ago
h.mon 35k

One option: use DAVID conversion tool: https://david.ncifcrf.gov/conversion.jsp, select OFFICIAL_GENE_SYMBOL.

Another option: use R, with the AnnotationDbi and org.Mm.eg.db packages

library( AnnotationDbi )
library( org.Mm.eg.db )
geneSymbol <- select( org.Mm.eg.db, keys = "NR_000002",
                      columns = "SYMBOL",  keytype = "REFSEQ" )
ADD COMMENT
0
Entering edit mode

I ended up finding a refseq annotation file called "refMrna.fa.gz" on UCSC website which gives me mapping between transcript to names. After all, thank you for your reply!

ADD REPLY
0
Entering edit mode
6.3 years ago

Another option is to use MyGene (modified from this excellent answer):

#!/usr/bin/env python

import sys
import mygene

ids = set()
with open('genes.txt', 'r') as f:
    for line in f:
        id = line.rstrip()
        ids.add(id)

m = mygene.MyGeneInfo()
r = m.querymany(list(ids),
                scopes='refseq',
                fields='symbol',
                species='mouse',
                as_dataframe=False)

for e in r:
    sys.stdout.write("%s\t%s\n" % (e['query'], e['symbol']))

Given a test file called genes.txt containing:

NR_046233

The output looks like:

NR_046233       Rn45s
ADD COMMENT
0
Entering edit mode

I ended up finding a refseq annotation file called "refMrna.fa.gz" on UCSC website which gives me mapping between transcript to names. After all, thank you for your reply!

ADD REPLY

Login before adding your answer.

Traffic: 1759 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6