I would like to map UCSC transcripts id (mouse genome mm10, I downloaded the transcripts id from http://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/refMrna.fa.gz) to gene symbol. I have a list of transcript id like 'NR_046233 2' and want to get a list of corresponding gene symbols.
Dose anyone know how to map each transcript id to gene symbol?
I ended up finding a refseq annotation file called "refMrna.fa.gz" on UCSC website which gives me mapping between transcript to names. After all, thank you for your reply!
#!/usr/bin/env python
import sys
import mygene
ids = set()
with open('genes.txt', 'r') as f:
for line in f:
id = line.rstrip()
ids.add(id)
m = mygene.MyGeneInfo()
r = m.querymany(list(ids),
scopes='refseq',
fields='symbol',
species='mouse',
as_dataframe=False)
for e in r:
sys.stdout.write("%s\t%s\n" % (e['query'], e['symbol']))
I ended up finding a refseq annotation file called "refMrna.fa.gz" on UCSC website which gives me mapping between transcript to names. After all, thank you for your reply!
I ended up finding a refseq annotation file called "refMrna.fa.gz" on UCSC website which gives me mapping between transcript to names. After all, thank you for your reply!