Ensembl ID to ENTREZ best converter
1
5
Entering edit mode
4.6 years ago
Morris_Chair ▴ 370

Hello everyone,

I used different tools to convert human gene ensemble ID to ENTREZ but each of them seems to have problems like: -BioMart does not recognizes all the ensemble ID but barely 30% , -BioDBnet ...I couldn't make it working cause I could only select the type of output , -DAVID gives me lots of false duplicates.

What do you usually use for this work ?

thank you

gene • 20k views
ADD COMMENT
1
Entering edit mode

I use biomaRt (for ensembl to uniprot id conversion) and this works fine for me (not 100%, I still loose some ids, but this number is small). Maybe you could try to play around a bit with the "attributes" and "filters"?

ADD REPLY
1
Entering edit mode

BioMart does not recognizes all the ensemble ID but barely 30%

Please show code and data examples. This is highly unlikely if used properly since biomart directly connects to Ensembl.

ADD REPLY
0
Entering edit mode

Hi ATpoint , here is the code

mart <- biomaRt::useMart(biomart = "ENSEMBL_MART_ENSEMBL",
                         dataset = "hsapiens_gene_ensembl",
                         host = "http://www.ensembl.org")

genes <- getBM(filters = "ensembl_gene_id",
               attributes = c("ensembl_gene_id","entrezgene_id"),
               values = new_rowname, 
               mart = mart)

Probably there is something wrong with my hosting service?

Thank you

ADD REPLY
3
Entering edit mode

Provide examples of identifiers that don't seem to map. It would also help to know where these ID's came from.

ADD REPLY
0
Entering edit mode

Thank you genomax, indeed.. most of them are non-coding RNA :)

ADD REPLY
1
Entering edit mode

BioMart, as others have mentioned, tends to work pretty well. Have you removed the version numbers off the end of the ensembl ID? (ie 1234567.89 --> 1234567)

ADD REPLY
0
Entering edit mode

Hi aaragak1, yes I did,

thanks

ADD REPLY
0
Entering edit mode

yes I did that, thanks :)

ADD REPLY
1
Entering edit mode

Are the ensembl IDs that you are working with derived from the current ensembl release?

ADD REPLY
0
Entering edit mode

Hi russh, yes they are thx, problem solved anyways. Most of those ensemble where from non coding RNA

ADD REPLY
0
Entering edit mode

Hi Arup Ghosh

Thanks for this code? Is the line Data$entrez calling the in which you want to convert the ENSEMBL Id's? and if so what is row.names(data) referring to - is it saying that the values are the rows in that file data. Should would specify the column in that brackets - such as (data$column).

I am trying to refer to a csv file that I have some ENSEMBL Id's in and I wish to convert these to ENTREZ id's as a new column. I was trying to use the packages above but I get

Error in ENSEMBL(symb_tt3) : could not find function "ENSEMBL"

or

Unknown or uninitialised column: `gene_id`.Error in mapIds_base(x, keys, column, keytype, ..., multiVals = multiVals) : 
  mapIds must have at least one key to match against.

When I use this code:

library("AnnotationDbi") library("org.Hs.eg.db") symb_tt3$gene_id = mapIds(org.Hs.eg.db,
                   keys=ENSEMBL(symb_tt3), 
                    column="ENTREZID",
                    keytype="ENSEMBL",
                    multiVals="first")
ADD REPLY
0
Entering edit mode

Please use ADD COMMENT to keep things logically organized.

ADD REPLY
0
Entering edit mode

The keys argument is there to specify the column containing the Ensembl ids.

library("AnnotationDbi")
library("org.Hs.eg.db")
symb_tt3$ENTREZID_id = mapIds(org.Hs.eg.db,
                       keys=symb_tt3$ENSEMBL_id_column, 
                        column="ENTREZID",
                        keytype="ENSEMBL",
                        multiVals="first")

This will add a new(ENTREZID_id) column in your dataframe containing the Entrez ids.

ADD REPLY
5
Entering edit mode
4.6 years ago

You can use AnnotationDbi to convert Ensembl ids. The code snippet converts Ensembl id to Entrez ids.

library("AnnotationDbi")
library("org.Hs.eg.db")
#columns(org.Hs.eg.db) # returns list of available keytypes
data$entrez = mapIds(org.Hs.eg.db,
                    keys=row.names(data), #Column containing Ensembl gene ids
                    column="ENTREZID",
                    keytype="ENSEMBL",
                    multiVals="first")
ADD COMMENT
0
Entering edit mode

Hi, is this generally acceptable to simply use multiVals="first", I mean how do I know if it is matching to the first entrez ID?

ADD REPLY
0
Entering edit mode

The multiVals options default to returning the first occurrence. You can change the option value to list to check for the order of mapped Entrez IDs.

first: This value means that when there are multiple matches only the 1st thing that comes back will be returned. This is the default behavior

Ref: https://bioconductor.org/packages/release/bioc/manuals/AnnotationDbi/man/AnnotationDbi.pdf

ADD REPLY

Login before adding your answer.

Traffic: 2445 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6