Question

ucsc ids conversion

1

Entering edit mode

4.7 years ago

annaA ▴ 10

Hello,

I have a list o gene ids from a GENECODE Long non-coding RNA gene annotation file and I want to translate them to UCSC ids. I tried BioMart but it doesn't really work I am not getting the correct format of UCSC id. Do you have another converter to suggest ? or another solution?

Thank you, Anna

ucsc genecode conversion ensamble • 1.5k views

ADD COMMENT • link updated 4.7 years ago by max ▴ 60 • written 4.7 years ago by annaA ▴ 10

1

Entering edit mode

I tried BioMart but it doesn't really work

Show us the code even if it didn't work.

ADD REPLY • link 4.7 years ago by zx8754 12k

0

Entering edit mode

which assembly ? give us some examples.

ADD REPLY • link 4.7 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Can you explain a little why you want UCSC IDs? Note that UCSC Ids are not created anymore, UCSC defaults to Gencode IDs now. See the UCSC Genes FAQ: https://genome.ucsc.edu/FAQ/FAQgenes.html#hg19

ADD REPLY • link 4.7 years ago by max ▴ 60

0

Entering edit mode

I don't really need them .I am just creating a file with long coding RNAs which I am going to use for my project ( TF and lncRNA expression networks). So I already have a file for the TFs and I have to use it as a draft to create the one for lncRNAs. Based on the info you gave me tho, I don't need to keep searching for these IDs

I am super new to this field so sorry for the "silly" question

ADD REPLY • link 4.7 years ago by annaA ▴ 10

score 2 · Answer 1 · 2020-03-02

Here's an example script to query data from the mygene.info service:

#!/usr/bin/env python                                                                                                                                                                                                                                                                                                         

import sys
from mygene import MyGeneInfo

mg = MyGeneInfo()

ids = [
    "ENSG00000263726",
    "ENSG00000269044",
    "ENSG00000249849",
    "ENSG00000242770",
    "ENSG00000235356"
]

results = mg.querymany(ids, fields=["symbol,refseq"], species="human", verbose=False)

for res in results:
    q = res['query']
    r = 'NA'
    s = 'NA'
    if 'symbol' in res:
        s = res['symbol']
    if 'refseq' in res:
        r = res['refseq']
    sys.stdout.write('{}\t{}\t{}\n'.format(q, r, s))

To run it:

$ python ./query.py 
ENSG00000263726 NA      NA
ENSG00000269044 NA      AC024075.2
ENSG00000249849 NA      AC138819.1
ENSG00000242770 NA      CD200R1L-AS1
ENSG00000235356 NA      AL592466.1

The example list is just made up of a few randomly picked records from GENCODE's current long non-coding RNA gene annotation dataset, but as you can see, not all of them have HGNC symbols, and none in this list contain RefSeq IDs.

Still, this might be useful for your particular list of IDs, or it may give you an idea of how to query records via this tool, generally.