How To Convert List Of Entrez Ids Into Gene Name
3
8
Entering edit mode
11.7 years ago
grosy ▴ 100

Hi Friends,

I have list of 10,000 Entrez IDs and I want to convert the multiple Entrez IDs into the respective gene names. Could someone suggest me the way to do this?

In a Bioconductor package called "Biomart", we can do this for individual gene. Like

library(org.Hs.eg.db)
library(annotate)
lookUp('3815', 'org.Hs.eg', 'SYMBOL') 
   $`3815` 
   [1] "KIT"
lookUp('3815', 'org.Hs.eg', 'REFSEQ') 
   $`3815`
   [1] "NM_000222" "NM_001093772" "NP_000213" "NP_001087241"

This answer I got it from SEQanswer, but then is there any way to do this for multiple Entrez IDs?

Thanks in advance.

r genomics entrez • 60k views
ADD COMMENT
2
Entering edit mode

I think this may be one of the easiest way to do this task. You can convert Entrez ID into gene name by using website called "MatchMiner" (http://discover.nci.nih.gov/matchminer/MatchMinerLookup.jsp). All you need to do is to upload a file that contains all your Entrez IDs. This website will convert them into HUGO gene names.

ADD REPLY
0
Entering edit mode

Thanks @hojoon.compbio it worked... :o)

ADD REPLY
0
Entering edit mode

What is the library "annotate" and how can I install it, please?

Thanks.

ADD REPLY
2
Entering edit mode

It's a Bioconductor package; details and installation instructions are here:

http://bioconductor.org/packages/release/bioc/html/annotate.html

ADD REPLY
0
Entering edit mode

Great! Which function converts gene symbols to entrez gene ids, please?

Thanks.

ADD REPLY
0
Entering edit mode

Time for you to read some documentation I think :)

ADD REPLY
0
Entering edit mode

Thanks for your question, this what I need

ADD REPLY
17
Entering edit mode
11.7 years ago
David W 4.9k

This is an easy one - just pass a character vector that has more than one value:

getSYMBOL(c('3815', '3816', '2341'), data='org.Hs.eg')
    3815     3816     2341 
   "KIT"   "KLK1" "FNTAP2"
ADD COMMENT
0
Entering edit mode

Yeah Thanks a lot :) but it doesn't work more than some 100 gene IDs... so all i have to do now is to

a <- read.csv("entrez ids.csv", header = TRUE)

library(org.Hs.eg.db)

library(annotate)

d= getSYMBOL(a, data='org.Hs.eg') Error in .checkKeysAreWellFormed(keys) : keys must be supplied in a character vector with no NAs

This is the error i get....

ADD REPLY
1
Entering edit mode

When you read data into an R session with read.csv you get a dataframe containing rows and columns. In this case you probably have all your ids in one column which you can specify with $. Something like a$EntrezIDs. If you are new to R you should probably read some intro tutorials

ADD REPLY
1
Entering edit mode

I don't think the issue is number of IDs. I've retrieved tens of thousands of attributes (slowly) in one go using biomaRt.

ADD REPLY
0
Entering edit mode

In a loop, can you pass in a vector of 100 elements at a time? (Or perhaps you need to filter out bad/NA entries?)

ADD REPLY
0
Entering edit mode

Actually i think the problem could be solved if i take the CSV file and list it in a variable... Like given in the Bioconducter package

"http://stuff.mit.edu/afs/athena/software/r_v2.14.1/lib/R/library/org.Hs.eg.db/html/org.Hs.egSYMBOL.html"

But the only problem i am facing now is to list the each value from the CSV file

ADD REPLY
0
Entering edit mode
d= getSYMBOL(na.omit(a), data='org.Hs.eg')
ADD REPLY
0
Entering edit mode

Hello there! I got the same issue earlier, but solved it using as.character(). Since the Entrez Gene IDs are made of numbers, they were loaded in as 'integer' initially. Hope it helps

ADD REPLY
0
Entering edit mode

Your answer help me a lot, thanks +1

ADD REPLY
5
Entering edit mode
11.7 years ago
David ▴ 740

You have geneIDs that are NA.

use mget with ifnotfound=NA

a <- read.csv("entrez ids.csv", header = TRUE)
a.symbol <- as.vector(unlist(mget(a, envir=org.Hs.egSYMBOL, ifnotfound=NA)))
ADD COMMENT
0
Entering edit mode

I am sorry i did but still it shows the same problem

a <- read.csv("C:\Users\Desktop\entrez ids _row.csv", header = TRUE) a.symbol <- as.vector(unlist(mget(a, envir=org.Hs.egSYMBOL, ifnotfound=NA)))

Error in .checkKeysAreWellFormed(keys) : keys must be supplied in a character vector with no NAs

ADD REPLY
0
Entering edit mode

R tells you what is wrong: "keys must be supplied in a character vector with no NAs"

Do that after read.csv

a <- a[-is.na(a)]
ADD REPLY
1
Entering edit mode
11.7 years ago
Jordan ★ 1.3k

Another way to do without coding is to use ID Mapping in Uniprot. You can just upload a list of entrez id's and then map it.

ADD COMMENT
0
Entering edit mode

ya i tried this... But i Want is from ENTREZ ID to GENE NAME... Could you suggest me the options to be choosen to convert From Entrez ID to GENE NAME?

ADD REPLY
0
Entering edit mode

One silly way of doing it is, mapping it to uniprot id's and then to your required Gene names. But I think you already got the answer. I usually download the ID mapping file from uniprot and write my own code for mapping in perl.

ADD REPLY

Login before adding your answer.

Traffic: 2009 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6