Here's an example of doing the conversion using biomaRt. You can use the versioned IDs you've got, but you'll see it's better the remove the version numbers.
First, we'll load biomaRt and use your example IDs.
library(biomaRt)
mart <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")
gene_ids_version <- c("ENSG00000236246.1",
"ENSG00000281088.1",
"ENSG00000254526.1",
"ENSG00000223575.2",
"ENSG00000201444.1",
"ENSG00000232573.1")
Now we can query BioMart, specifying that we want to use the versioned Ensembl Gene IDs by using the following:
getBM(attributes = c('ensembl_gene_id_version',
'entrezgene'),
filters = 'ensembl_gene_id_version',
values = gene_ids_version,
mart = mart)
>
ensembl_gene_id_version entrezgene
1 ENSG00000201444.1 NA
2 ENSG00000223575.2 NA
3 ENSG00000232573.1 NA
4 ENSG00000254526.1 NA
However, notice that we only get 4 results returned from our 6 IDs. This is because if you query using a version number, but it isn't the most recent version, it doesn't return a result - not really ideal. Better to do as Emily suggests, and strip the version number to use just the Ensembl gene ID. We'll use the stringr package to do that here:
library(stringr)
gene_ids <- str_replace(gene_ids_version,
pattern = ".[0-9]+$",
replacement = "")
Now rerun the query with the trimmed IDs and you'll get 5 results this time:
getBM(attributes = c('ensembl_gene_id',
'entrezgene'),
filters = 'ensembl_gene_id',
values = gene_ids,
mart = mart)
>
ensembl_gene_id entrezgene
1 ENSG00000201444 NA
2 ENSG00000223575 NA
3 ENSG00000232573 NA
4 ENSG00000236246 NA
5 ENSG00000254526 NA
The completely missing entry is because that gene, ENSG00000281088, has been retired from Ensembl, so you'll never get a result. The NA
values for the rest are because there's no mapping between Ensembl and Entrez for those genes.
Just to check it's really working we'll demonstrate with some IDs that can be mapped.
getBM(attributes = c('ensembl_gene_id',
'entrezgene'),
filters = 'ensembl_gene_id',
values = c('ENSG00000001460', 'ENSG00000008517', 'ENSG00000009724'),
mart = mart)
>
ensembl_gene_id entrezgene
1 ENSG00000001460 90529
2 ENSG00000008517 9235
3 ENSG00000009724 10747
Related post at SO: