probe ID to ensemble
1
0
Entering edit mode
4.0 years ago
dxodnd • 0

There seems to be no tool to convert ENSEMBL to probe id

Besides, the tools that convert ENSEMBL to GENE SYMBOL are also incredible.

third_translation = select(hgu133a.db, as.character(third_data_id), c('SYMBOL', 'ENTREZID', 'GENENAME'), keytype="ENSEMBL")$SYMBOL
first_data_id[928] # result 201400_at
first_translation[928] # result NA
select(hgu133a.db, as.character(first_data_id[927:928]), c('SYMBOL'))$SYMBOL

The last piece of code is also converted properly.

plz save me...

conversion ensembl probe id • 2.0k views
ADD COMMENT
2
Entering edit mode

Hi, Biostar is indexed by google, so that every question and answers have the potential to become useful to others through as simple google search. It is therefore essential to keep the meta-data of a question (title and tags) as clear and precise as possible. Please edit your current title, as it does not currently provide any useful information. See How To Ask Good Questions On Technical And Scientific Forums

ADD REPLY
1
Entering edit mode

I have edited the question title

ADD REPLY
0
Entering edit mode

What is the question? What programming language are you using? If you find that some useful functionality is missing from a library, feel free to contribute code to fill the gap. I am sure the authors of the library and the larger community will be grateful.

ADD REPLY
3
Entering edit mode
4.0 years ago

You should be able to convert back to probe ID in different ways:

1, create a Master annotation table

Here, we simply output all records from the database. You can then later use this to look up the probe IDs manually

require(hgu133a.db)

annotMaster1 <- select(hgu133a.db,
  keys = keys(hgu133a.db, 'PROBEID'),
  column = c('PROBEID',  'SYMBOL',  'ENTREZID', 'ENSEMBL'),
  keytype = 'PROBEID')

dim(annotMaster1)
[1] 28437     4

head(annotMaster1)
    PROBEID SYMBOL ENTREZID         ENSEMBL
1 1007_s_at   DDR1      780 ENSG00000204580
2 1007_s_at   DDR1      780 ENSG00000234078
3 1007_s_at   DDR1      780 ENSG00000215522
4 1007_s_at   DDR1      780 ENSG00000137332
5 1007_s_at   DDR1      780 ENSG00000223680
6 1007_s_at   DDR1      780 ENSG00000229767

Note that it's a bit tidier without those Ensembl entries:

annotMaster2 <- select(hgu133a.db,
   keys = keys(hgu133a.db, 'PROBEID'), 
  column = c('PROBEID',  'SYMBOL',  'ENTREZID'),
  keytype = 'PROBEID')

dim(annotMaster2)
[1] 24468     3

head(annotMaster2)
    PROBEID  SYMBOL  ENTREZID
1 1007_s_at    DDR1       780
2 1007_s_at MIR4640 100616237
3   1053_at    RFC2      5982
4    117_at   HSPA6      3310
5    121_at    PAX8      7849
6 1255_g_at  GUCA1A      2978

2, directly look up the probe ID for each corresponding Ensembl ID

ens_ids <- unique(annotMaster1$ENSEMBL)

lookup <- select(hgu133a.db,
  keys = ens_ids, 
  column = c('PROBEID',  'SYMBOL',  'ENSEMBL'),
  keytype = 'ENSEMBL')

head(lookup)
          ENSEMBL     PROBEID SYMBOL
1 ENSG00000204580   1007_s_at   DDR1
2 ENSG00000204580 207169_x_at   DDR1
3 ENSG00000204580 208779_x_at   DDR1
4 ENSG00000204580 210749_x_at   DDR1
5 ENSG00000234078   1007_s_at   DDR1
6 ENSG00000234078 207169_x_at   DDR1

Kevin

ADD COMMENT
0
Entering edit mode

you are my god. Thank you so much and just want to ask a few more questions. 1. Why did you get an error with the existing method? 2. I tried using databases such as lumiHumanAll to convert ensemble into gene. but However, any database has 20,000 ensembl.

library('lumiHumanAll.db')
result3 = select(lumiHumanAll.db, keys = keys(lumiHumanAll.db, 'PROBEID'), column = c('SYMBOL', 'ENSEMBL', 'REFSEQ', 'GENENAME'))

first, I don't know why I get an error once I enter 'ENSEMBL' in the key (it says that 'PROBEID' is required). second, There are 22223 results.

x <- lumiHumanAllENSEMBL
# Get the entrez gene IDs that are mapped to an Ensembl ID
mapped_genes <- mappedkeys(x)
# Convert to a list
xx <- as.list(x[mapped_genes])

The number of xx is 44765. Where did the 24,000 go? The number of ensembl I need is 37045.(Illumina HiSeq 2000, homo sapiens) Bioinformatics is not as easy as I think. thanks for reading. :)

ADD REPLY
0
Entering edit mode

first, I don't know why I get an error once I enter 'ENSEMBL' in the key (it says that 'PROBEID' is required). second, There are 22223 results.

You need to specify keytype:

library('lumiHumanAll.db')

keytypes(lumiHumanAll.db)
 [1] "ACCNUM"       "ALIAS"        "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS"
 [6] "ENTREZID"     "ENZYME"       "EVIDENCE"     "EVIDENCEALL"  "GENENAME"    
[11] "GO"           "GOALL"        "IPI"          "MAP"          "OMIM"        
[16] "ONTOLOGY"     "ONTOLOGYALL"  "PATH"         "PFAM"         "PMID"        
[21] "PROBEID"      "PROSITE"      "REFSEQ"       "SYMBOL"       "UCSCKG"      
[26] "UNIGENE"      "UNIPROT"

result3 <- select(lumiHumanAll.db,
  keys = keys(lumiHumanAll.db, 'ENSEMBL'),
  column = c('SYMBOL', 'ENSEMBL', 'REFSEQ', 'GENENAME'),
  keytype = 'ENSEMBL')

head(result3)
          ENSEMBL SYMBOL       REFSEQ               GENENAME
1 ENSG00000121410   A1BG    NM_130786 alpha-1-B glycoprotein
2 ENSG00000121410   A1BG    NP_570602 alpha-1-B glycoprotein
3 ENSG00000175899    A2M    NM_000014  alpha-2-macroglobulin
4 ENSG00000175899    A2M NM_001347423  alpha-2-macroglobulin
5 ENSG00000175899    A2M NM_001347424  alpha-2-macroglobulin
6 ENSG00000175899    A2M NM_001347425  alpha-2-macroglobulin

I am not sure about the other issue, but there will always be situations where a single Ensembl gene ID maps to multiple Gene Symbols, and vice-versa. You have to specify your own rules about how to deal with these situations.

ADD REPLY

Login before adding your answer.

Traffic: 1791 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6