ucsc ids conversion
1
1
Entering edit mode
4.7 years ago
annaA ▴ 10

Hello,

I have a list o gene ids from a GENECODE Long non-coding RNA gene annotation file and I want to translate them to UCSC ids. I tried BioMart but it doesn't really work I am not getting the correct format of UCSC id. Do you have another converter to suggest ? or another solution?

Thank you, Anna

ucsc genecode conversion ensamble • 1.5k views
ADD COMMENT
1
Entering edit mode

I tried BioMart but it doesn't really work

Show us the code even if it didn't work.

ADD REPLY
0
Entering edit mode

which assembly ? give us some examples.

ADD REPLY
0
Entering edit mode

Can you explain a little why you want UCSC IDs? Note that UCSC Ids are not created anymore, UCSC defaults to Gencode IDs now. See the UCSC Genes FAQ: https://genome.ucsc.edu/FAQ/FAQgenes.html#hg19

ADD REPLY
0
Entering edit mode

I don't really need them .I am just creating a file with long coding RNAs which I am going to use for my project ( TF and lncRNA expression networks). So I already have a file for the TFs and I have to use it as a draft to create the one for lncRNAs. Based on the info you gave me tho, I don't need to keep searching for these IDs

I am super new to this field so sorry for the "silly" question

ADD REPLY
2
Entering edit mode
4.7 years ago

Here's an example script to query data from the mygene.info service:

#!/usr/bin/env python                                                                                                                                                                                                                                                                                                         

import sys
from mygene import MyGeneInfo

mg = MyGeneInfo()

ids = [
    "ENSG00000263726",
    "ENSG00000269044",
    "ENSG00000249849",
    "ENSG00000242770",
    "ENSG00000235356"
]

results = mg.querymany(ids, fields=["symbol,refseq"], species="human", verbose=False)

for res in results:
    q = res['query']
    r = 'NA'
    s = 'NA'
    if 'symbol' in res:
        s = res['symbol']
    if 'refseq' in res:
        r = res['refseq']
    sys.stdout.write('{}\t{}\t{}\n'.format(q, r, s))

To run it:

$ python ./query.py 
ENSG00000263726 NA      NA
ENSG00000269044 NA      AC024075.2
ENSG00000249849 NA      AC138819.1
ENSG00000242770 NA      CD200R1L-AS1
ENSG00000235356 NA      AL592466.1

The example list is just made up of a few randomly picked records from GENCODE's current long non-coding RNA gene annotation dataset, but as you can see, not all of them have HGNC symbols, and none in this list contain RefSeq IDs.

Still, this might be useful for your particular list of IDs, or it may give you an idea of how to query records via this tool, generally.

ADD COMMENT
0
Entering edit mode

Thank you so much for the code!!

ADD REPLY

Login before adding your answer.

Traffic: 2454 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6