This should be achievable with biomaRt
in R.
ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
## instead of using the wildcard ("*") use a vector of genbank accession you are using.
dat = getBM(attributes = c("protein_id", "embl", "hgnc_symbol"), values = "*", mart = ensembl)
You can now simply match the genBank accession in your data with the genBank accession ids in dat.
dat[match(yourdata$genBank, dat$protein_id),]
I haven't tested this (not 100% sure your IDs will match), but if you want to search for what biomaRt has in the future use the listAttributes()
function. I tend to write it to a data frame so I can search with grep()
terms of interest.
x = listAttributes(ensembl)
x[grep("Genbank", x$description),]
First answer in thread you linked suggests BioMart (which is a web based tool). There is a R version of it as well. Tutorials for BioMart are here if you are not familiar with it.
I can not find "genbank accession" database in Biomart. So I can not convert a list of genbank accession number to gene symbols using Biomart package.
Most of those appear to be cDNA clones from IMAGE and other sources. You should be able to get the gene symbols using this file from NCBI.
Thansks for your answer. How to use it(gene2accession)? I do not know. I can not open it.
You need to download and gunzip the file (it is compressed). If you are on OS X/unix that would be simple. On windows you will need to use 7-zip program.