Question

How to map UCSC transcripts to gene symbol?

0

Entering edit mode

6.7 years ago

wenbinm ▴ 40

Hi there,

I would like to map UCSC transcripts id (mouse genome mm10, I downloaded the transcripts id from http://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/refMrna.fa.gz) to gene symbol. I have a list of transcript id like 'NR_046233 2' and want to get a list of corresponding gene symbols.

Dose anyone know how to map each transcript id to gene symbol?

Thank you!

genome rna-seq assembly • 5.2k views

ADD COMMENT • link updated 6.7 years ago by Alex Reynolds 36k • written 6.7 years ago by wenbinm ▴ 40

0

Entering edit mode

6.7 years ago

Alex Reynolds 36k

Another option is to use MyGene (modified from this excellent answer):

#!/usr/bin/env python

import sys
import mygene

ids = set()
with open('genes.txt', 'r') as f:
    for line in f:
        id = line.rstrip()
        ids.add(id)

m = mygene.MyGeneInfo()
r = m.querymany(list(ids),
                scopes='refseq',
                fields='symbol',
                species='mouse',
                as_dataframe=False)

for e in r:
    sys.stdout.write("%s\t%s\n" % (e['query'], e['symbol']))

Given a test file called genes.txt containing:

NR_046233

The output looks like:

NR_046233       Rn45s

ADD COMMENT • link 6.7 years ago by Alex Reynolds 36k

0

Entering edit mode

I ended up finding a refseq annotation file called "refMrna.fa.gz" on UCSC website which gives me mapping between transcript to names. After all, thank you for your reply!

ADD REPLY • link 6.7 years ago by wenbinm ▴ 40

score 1 · Accepted Answer · 2018-08-15

1

Entering edit mode

6.7 years ago

h.mon 35k

One option: use DAVID conversion tool: https://david.ncifcrf.gov/conversion.jsp, select OFFICIAL_GENE_SYMBOL.

Another option: use R, with the AnnotationDbi and org.Mm.eg.db packages

library( AnnotationDbi )
library( org.Mm.eg.db )
geneSymbol <- select( org.Mm.eg.db, keys = "NR_000002",
                      columns = "SYMBOL",  keytype = "REFSEQ" )

ADD COMMENT • link 6.7 years ago by h.mon 35k

0

Entering edit mode

I ended up finding a refseq annotation file called "refMrna.fa.gz" on UCSC website which gives me mapping between transcript to names. After all, thank you for your reply!

ADD REPLY • link 6.7 years ago by wenbinm ▴ 40