gene ID to gene symbol in R
1
0
Entering edit mode
4 weeks ago

sorry, this is probably a common question, but I don't even know how to start.. can somebody give me a comprehensive and step-to-step response about how to convert gene IDs to gene symbols in R? many thanks

R • 546 views
ADD COMMENT
0
Entering edit mode

I need Entrez IDs to gene names using R..

ADD REPLY
0
Entering edit mode

Please post few examples.

Using EntrezDirect (there is an R package for this that you can look into):

$ esearch -db gene -query "120883619" | efetch -format docsum | xtract -pattern DocumentSummary -element Id,Name
120883619       SLC17A6-DT
ADD REPLY
0
Entering edit mode

can I use biomart to convert Entrez ID to gene symbols?

ADD REPLY
1
Entering edit mode
4 weeks ago
Gordon Smyth ★ 7.7k

This is indeed a commonly asked question, but more on Bioconductor than on Biostars. Bioconductor provides annotation packages that link Entrez IDs to annotation for different organisms. See for example the "Adding gene annotation" section of:

https://bioconductor.org/packages/release/workflows/vignettes/RnaSeqGeneEdgeRQL/inst/doc/edgeRQL.html

or the "Organizing gene annotation" section of

https://bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/limmaWorkflow.html

Both of these examples are for mouse but it works the same for other species. For human, you would substitute org.Hs.eg.db instead of org.Mm.eg.db.

Using an Ensembl-based tool like biomart is not appropriate for Entrez IDs. While biomart will map Entrez IDs, it will lose many matches because it has to go via Ensembl IDs, which do not have a 1-1 correspondence to NCBI Entrez IDs. NCBI and Ensembl are independent annotation systems and it is best to stick with one or the other.

Alternatively, the NCBI provides "gene info" files for each species, which give concise summary annotation information for each Entrez ID in the form of a tab-delimited file:

https://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/

My personal approach for gene-level NCBI-based annotation is to download the relevant gene info file and read it into R as a data.frame. That very quickly gives definitive gene symbol, aliases, chromosome and type of gene for each Entrez ID.

ADD COMMENT

Login before adding your answer.

Traffic: 1701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6