Hi all,
I have UCSC isforms ids (~73000) and I want to convert them to Ensemble transcript IDs (ENST000....) I've tried to use UCSC or Biomart, but I was not successful. I think these ids belongs to genome assembly h19. Would someone help me with that? Here is some of UCSC isform IDs:
"531" "uc002ldw.3"
"532" "uc002ldx.3"
"533" "uc002hnk.2"
"534" "uc002hnl.2"
"535" "uc002hnm.2"
"536" "uc002hnn.2"
"537" "uc002hno.2"
"538" "uc002hnp.1"
"539" "uc002hnq.2"
"540" "uc002hnr.1"
"541" "uc010cuy.2"
"542" "uc010cuz.2"
"543" "uc010wdb.1"
"544" "uc010wdc.1"
"545" "uc001tob.2"
"546" "uc001toc.2"
"547" "uc001tod.2"
"548" "uc010sxl.1"
"549" "uc010sxm.1"
"550" "uc001tso.3"
"551" "uc001tsp.2"
"552" "uc001tsq.2"
"553" "uc001tsr.2"
"554" "uc001tss.1"
"555" "uc009zvw.2"
"556" "uc009zvx.2"
"557" "uc003eov.3"
"558" "uc003eoy.2"
"559" "uc011blr.1"
"560" "uc001qhk.2"
"561" "uc001qhl.2"
"562" "uc009zdc.2"
"563" "uc009zde.1"
"564" "uc010sco.1"
"565" "uc010scp.1"
"566" "uc010scq.1"
"567" "uc010scr.1"
"568" "uc003ela.3"
"569" "uc003elb.2"
"570" "uc003elc.1"
"571" "uc003eld.1"
"572" "uc003ele.2"
"573" "uc010hsw.1"
"574" "uc011bks.1"
"575" "uc002vdz.3"
"576" "uc010zjg.1"
"577" "uc001dgw.3"
"578" "uc001dgx.3"
"579" "uc009wbp.2"
"580" "uc009wbr.2"
"581" "uc009wbs.1"
"582" "uc010orc.1"
"583" "uc010ord.1"
"584" "uc010ore.1"
"585" "uc010orf.1"
"586" "uc010org.1"
"587" "uc001lhb.2"
"588" "uc010qub.1"
"589" "uc001tza.3"
"590" "uc001tzb.3"
"591" "uc010szl.1"
"592" "uc002gev.2"
"593" "uc002gew.2"
These are the human transcriptome Ids
Then do you know to which assembly?
hg18
orhg19
orGrch38
? If you know then the above link which I have given shows different examples to download the entire refseq ids with transcript and other ids and then you can justmerge
your gene transcript ids from the downloaded file with R and retrieve all the valuable informations.It's hg19, but seems that doesn't work
Please be specific about mentioning what does not work and what are you doing that the result is not as expected. Did you download the tab-delimeted file from UCSC from the browser or even with the mysql command and then run
R merge
on both the files? Can you show the output results from the download and what command you used for merging? Else it will be difficult to debug for us. Both the files format has to be same or else put them in vectors inR
and then match for columns.I have a similar problem. I downloaded level 3 isoform level data from TCGa and they have UCSC IDs there. I tried Biomart, DAVID, and also some other conversion tools online. The problem is most of the IDs do not match to any Ensembl ID. I tried to look the other way round online too (i.e my transcript of interest and its corresponding UCSC ID but cound not find it ). Am I missing something here?
BioMart can convert UCSC IDs to ENS IDs. 'uc010ajn.1' gets converted to ENSG00000211814 (gene) and ENST00000390462 (transcript). However the IDs from you list does not seem to work. These same IDs cannot be found in UCSC either when using their table browser, whereas 'uc010ajn.1' can.
So the input source is not correct then, the OP needs to know what is the proper ID , the online BioMart should also be able to do the same thing as you suggested but I was just giving a hang of trying browser and programmatically the above thing. Unless the OP gets the corrected ids then all the above suggestions will not work.
Note: Get to know where your source is from. Try to keep a doc of all sources that will be used as arguments for downstream work. Makes life easy to know what you are using and what you intend to do so even if something is broken people can give suggestions or debug it.