Question

How to convert UCSC transcript(isforms) ids to Ensemble transcript ids?

0

Entering edit mode

8.6 years ago

jack ▴ 980

Hi all,

I have UCSC isforms ids (~73000) and I want to convert them to Ensemble transcript IDs (ENST000....) I've tried to use UCSC or Biomart, but I was not successful. I think these ids belongs to genome assembly h19. Would someone help me with that? Here is some of UCSC isform IDs:

"531"   "uc002ldw.3"
"532"   "uc002ldx.3"
"533"   "uc002hnk.2"
"534"   "uc002hnl.2"
"535"   "uc002hnm.2"
"536"   "uc002hnn.2"
"537"   "uc002hno.2"
"538"   "uc002hnp.1"
"539"   "uc002hnq.2"
"540"   "uc002hnr.1"
"541"   "uc010cuy.2"
"542"   "uc010cuz.2"
"543"   "uc010wdb.1"
"544"   "uc010wdc.1"
"545"   "uc001tob.2"
"546"   "uc001toc.2"
"547"   "uc001tod.2"
"548"   "uc010sxl.1"
"549"   "uc010sxm.1"
"550"   "uc001tso.3"
"551"   "uc001tsp.2"
"552"   "uc001tsq.2"
"553"   "uc001tsr.2"
"554"   "uc001tss.1"
"555"   "uc009zvw.2"
"556"   "uc009zvx.2"
"557"   "uc003eov.3"
"558"   "uc003eoy.2"
"559"   "uc011blr.1"
"560"   "uc001qhk.2"
"561"   "uc001qhl.2"
"562"   "uc009zdc.2"
"563"   "uc009zde.1"
"564"   "uc010sco.1"
"565"   "uc010scp.1"
"566"   "uc010scq.1"
"567"   "uc010scr.1"
"568"   "uc003ela.3"
"569"   "uc003elb.2"
"570"   "uc003elc.1"
"571"   "uc003eld.1"
"572"   "uc003ele.2"
"573"   "uc010hsw.1"
"574"   "uc011bks.1"
"575"   "uc002vdz.3"
"576"   "uc010zjg.1"
"577"   "uc001dgw.3"
"578"   "uc001dgx.3"
"579"   "uc009wbp.2"
"580"   "uc009wbr.2"
"581"   "uc009wbs.1"
"582"   "uc010orc.1"
"583"   "uc010ord.1"
"584"   "uc010ore.1"
"585"   "uc010orf.1"
"586"   "uc010org.1"
"587"   "uc001lhb.2"
"588"   "uc010qub.1"
"589"   "uc001tza.3"
"590"   "uc001tzb.3"
"591"   "uc010szl.1"
"592"   "uc002gev.2"
"593"   "uc002gew.2"

RNA-Seq Assembly genomics Ensembl USCS • 6.2k views

ADD COMMENT • link updated 8.6 years ago by Emily 24k • written 8.6 years ago by jack ▴ 980

score 1 · Answer 1 · 2016-05-11

1

Entering edit mode

8.6 years ago

ivivek_ngs ★ 5.2k

Do you think or they are from humans or whether they are hg19 or hg18 or even the latest one? You have to be a bit sure from which assembly they are from else it is not a correct way to proceed. In any case you can download all the gene list from UCSC browser. Take a look at this link or the wonderful mysql one liner as mentioned in the link. and download all of them and then you can just use merge in R to map your isoforms to corresponding transcript ids

ADD COMMENT • link 8.6 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

These are the human transcriptome Ids

ADD REPLY • link 8.6 years ago by jack ▴ 980

1

Entering edit mode

Then do you know to which assembly? hg18 or hg19 or Grch38? If you know then the above link which I have given shows different examples to download the entire refseq ids with transcript and other ids and then you can just merge your gene transcript ids from the downloaded file with R and retrieve all the valuable informations.

ADD REPLY • link 8.6 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

It's hg19, but seems that doesn't work

ADD REPLY • link 8.6 years ago by jack ▴ 980

0

Entering edit mode

Please be specific about mentioning what does not work and what are you doing that the result is not as expected. Did you download the tab-delimeted file from UCSC from the browser or even with the mysql command and then run R merge on both the files? Can you show the output results from the download and what command you used for merging? Else it will be difficult to debug for us. Both the files format has to be same or else put them in vectors in R and then match for columns.

ADD REPLY • link 8.6 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

I have a similar problem. I downloaded level 3 isoform level data from TCGa and they have UCSC IDs there. I tried Biomart, DAVID, and also some other conversion tools online. The problem is most of the IDs do not match to any Ensembl ID. I tried to look the other way round online too (i.e my transcript of interest and its corresponding UCSC ID but cound not find it ). Am I missing something here?

ADD REPLY • link 7.8 years ago by snishtala03 ▴ 70

0

Entering edit mode

BioMart can convert UCSC IDs to ENS IDs. 'uc010ajn.1' gets converted to ENSG00000211814 (gene) and ENST00000390462 (transcript). However the IDs from you list does not seem to work. These same IDs cannot be found in UCSC either when using their table browser, whereas 'uc010ajn.1' can.

ADD REPLY • link 8.6 years ago by Denise CS ★ 5.2k

0

Entering edit mode

So the input source is not correct then, the OP needs to know what is the proper ID , the online BioMart should also be able to do the same thing as you suggested but I was just giving a hang of trying browser and programmatically the above thing. Unless the OP gets the corrected ids then all the above suggestions will not work.

Note: Get to know where your source is from. Try to keep a doc of all sources that will be used as arguments for downstream work. Makes life easy to know what you are using and what you intend to do so even if something is broken people can give suggestions or debug it.

ADD REPLY • link 8.6 years ago by ivivek_ngs ★ 5.2k