I've the some gene names (eg: 11-cis-retinol dehydrogenase, D-2-hydroxyacid dehydrogenase (NAD+), 3alpha-hydroxysteroid 3-dehydrogenase) and their corresponding EC number. I want to map these gene names to other gene ids like EntrezGene ID, Ensembl ID etc.
The number of entries are >1k, so I can't do manual annotation for these.
Can someone suggest a way to map these names to ids?
Yep. It doesn't limit the query to Gene names.
It does. For e.g. HGNC Symbol or WikiGene Name. However, like Devon Ryan suggested, using the correct AnnotationDbi package would be more appropriate & will give more "accurate" results.
I want to convert the gene names and not the symbols.
Got it! Use Bioconductor. It will give you Gene Symbols, Entrez ID, Ensembl Gene ID etc for your Gene Names.
Okay. Sounds cool. Lemme try. Thanks
Umm, can you tell me how to implement the package?
The general idea is to make a character vector of gene names that you want to look up and then do something like
select(org.Mm.eg.db, keys=genes, columns=c("SYMBOL","ENTREZID","ENSEMBL"), keytype="GENENAME")
will look for the gene symbol, entrez ID, and Ensembl ID associated with each gene name in thegenes
vector. Note that this isn't a bullet-proof method. For example, it won't find any of your examples because it's expecting other names. "11-cis-retinol dehydrogenase" is also called "retinol dehydrogenase 5", for example, and that'll be found. All of these values from from entrez, so there aren't mappings to every possible name.If this doesn't work, I'd try something from this thread: Gene Id Conversion Tool
You can also try the EC IDs, which are called ENZYME with AnnotationDbi. That might end up working a bit better.