Gene symbol convert to Entrez ID
3
3
Entering edit mode
7.6 years ago
landscape95 ▴ 190

Can somebody help me in converting Gene name to Entrez in R. For example

Gene name: ANKRD62P1-PARP4P3 has entrez: 23783

like this page:

http://www.genecards.org/cgi-bin/carddisp.pl?gene=ANKRD62P1-PARP4P3&keywords=ANKRD62P1-PARP4P3

I tried biomaRt but it didn't have the gene (return to NA)!

I tried this:

ensembl=useDataset("hsapiens_gene_ensembl",mart = useMart("ensembl"))

  data = "ANKRD62P1-PARP4P3"
  ans <- unique(getBM(attributes = c("hgnc_symbol", "entrezgene"),    
                      filters = "hgnc_symbol",
                      values = data,
                      mart = ensembl) )

Your help is really appreciated!

Entrez R biomart • 52k views
ADD COMMENT
0
Entering edit mode

In what way did BioMart not help?

ADD REPLY
12
Entering edit mode
7.6 years ago
russhh 5.7k

I'd just use org.Hs.eg.db if it's a mapping within EntrezGene that you're interested in.

library(org.Hs.eg.db)
hs <- org.Hs.eg.db
my.symbols <- c("ANKRD62P1-PARP4P3")
select(hs, 
       keys = my.symbols,
       columns = c("ENTREZID", "SYMBOL"),
       keytype = "SYMBOL")
#              SYMBOL ENTREZID
# 1 ANKRD62P1-PARP4P3    23783
ADD COMMENT
0
Entering edit mode

Hello, I am a beginner in R. Can you tell me how to retrieve the output from this 'select' method into a data frame? Thanks!

ADD REPLY
1
Entering edit mode

It returns a data.frame. In R you have to be pretty explicit about which function you are using, and the code above might fail if you have {dplyr} (or another package that exports a select function / method) loaded. If the above doesn't work, you could try AnnotationDbi::select() to make it more explicit which package's select function you want to use (org.Hs.eg.db implicitly imports AnnotationDbi and the select function from AnnotationDbi is dispatched on orgDb objects like org.Hs.eg.db).

ADD REPLY
1
Entering edit mode

But, to store the data, just use assignment: gene_data <- AnnotationDbi::select(blah, blah, blah)

ADD REPLY
0
Entering edit mode

@russhh 'org.Hs.eg.db' works well with few genes. I have approx 17k genes, and I wish to convert these gene symbols to entrez ids. I tried submitting the entire list and I get the following error:

library(org.Hs.eg.db) library(AnnotationDbi)

hs <- org.Hs.eg.db my.symbols=myData$Symbols AnnotationDbi::select (hs, + keys = my.symbols, + columns = c("ENTREZID", "SYMBOL"), + keytype = "SYMBOL") Error in .testForValidKeys(x, keys, keytype, fks) : None of the keys entered are valid keys for 'SYMBOL'. Please use the keys method to see a listing of valid arguments.

The input file looks like this:

head(my.symbols) [1] " TTLL10 " " B3GALT6 " " SCNN1D " " PUSL1 " " VWA1 " " ATAD3C " class(my.symbols) [1] "character" class(myData) [1] "data.frame"

I tried submitting one of the genes as mentioned above in one of your response and it works well:

hs <- org.Hs.eg.db one.symbol=c("TTLL10") AnnotationDbi::select (hs, + keys = one.symbol, + columns = c("ENTREZID", "SYMBOL"), + keytype = "SYMBOL") 'select()' returned 1:1 mapping between keys and columns SYMBOL ENTREZID 1 TTLL10 254173

Any suggestions on how can I get entrezids for multiple genes at once?

I also tried using biomart, and I am facing an issue with it too. Do you think there is a mistake in the way I am submitting the gene list? Here is the code for biomart:

For one gene (biomart)

library(biomaRt) mart <- useMart("ENSEMBL_MART_ENSEMBL") mart <- useDataset("hsapiens_gene_ensembl", mart) one.symbol_b= c("TTLL10")

annotLookup <- getBM( + mart=mart, + attributes=c("entrezgene_id", "hgnc_symbol"), filter = "hgnc_symbol", + values = one.symbol_b) head(annotLookup) entrezgene_id hgnc_symbol 1 254173 TTLL10

For the entire gene list from a dataframe (>17k gene symbols) (biomart)

mart <- useMart("ENSEMBL_MART_ENSEMBL") mart <- useDataset("hsapiens_gene_ensembl", mart)

my.symbols_b= my.Data$Symbols

annotLookup <- getBM( + mart=mart, + attributes=c("entrezgene_id", "hgnc_symbol"), filter = "hgnc_symbol", + values = my.symbols_b)

head(annotLookup) [1] entrezgene_id hgnc_symbol
<0 rows> (or 0-length row.names)

I have no clue where I am going wrong. It will be really helpful if you could guide me in the right direction.

Thanks in advance.

ADD REPLY
2
Entering edit mode
7.6 years ago
Emily 24k

It didn't work in BioMart because Ensembl does not have the link between ANKRD62P1-PARP4P3 and 23783, since they have different biotypes: http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000189295;r=22:16654066-16675540

ADD COMMENT
0
Entering edit mode

Thank you! Is there any other way to get its entrez number?

ADD REPLY
1
Entering edit mode

How many genes (or pseudogene) you can't get the corresponding ID with biomaRt? I've looked at the pseudogene json files from HGNC and can not find a gene named ANKRD62P1-PARP4P3, although they do list ANKRD62P1-PARP4P3 as a pseudogene, which corresponds to Entrez 23783. There is a HGNC BioMart but no Entrez IDs as an attribute (or filter for that matter).

ADD REPLY
2
Entering edit mode
7.6 years ago
library(EnsDb.Hsapiens.v86)

hsens=EnsDb.Hsapiens.v86
my.symbols <- c("ANKRD62P1-PARP4P3")

select(hsens,  
       keys = my.symbols, 
       columns = c("ENTREZID", "SYMBOL", "GENEID"), 
       keytype = "SYMBOL")


# ENTREZID              SYMBOL          GENEID
# 1          ANKRD62P1-PARP4P3 ENSG00000189295
ADD COMMENT

Login before adding your answer.

Traffic: 2000 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6