How can I convert a list of genbank accession number to gene symbols?
2
2
Entering edit mode
8.2 years ago
524129693 ▴ 20

I have a list genbank accession number, e.g:AJ001495,AF339794,AK127588,BC039327,BC035392,NR_033244.1,BC038766,CR608805,S81294 How can I convert them to gene symbols? I find that Biomart have "refseq" dataSet, but not "genbank accession". I see the answers "Getting Gene Names From Genbank Ids", but I can not use "python", so, can R achieve it?

R gene • 16k views
ADD COMMENT
0
Entering edit mode

First answer in thread you linked suggests BioMart (which is a web based tool). There is a R version of it as well. Tutorials for BioMart are here if you are not familiar with it.

ADD REPLY
0
Entering edit mode

I can not find "genbank accession" database in Biomart. So I can not convert a list of genbank accession number to gene symbols using Biomart package.

ADD REPLY
0
Entering edit mode

Most of those appear to be cDNA clones from IMAGE and other sources. You should be able to get the gene symbols using this file from NCBI.

ADD REPLY
0
Entering edit mode

Thansks for your answer. How to use it(gene2accession)? I do not know. I can not open it.

ADD REPLY
0
Entering edit mode

You need to download and gunzip the file (it is compressed). If you are on OS X/unix that would be simple. On windows you will need to use 7-zip program.

ADD REPLY
0
Entering edit mode
8.2 years ago
wiggs38 • 0

This should be achievable with biomaRt in R.

ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl")

## instead of using the wildcard ("*") use a vector of genbank accession you are using.
dat = getBM(attributes = c("protein_id", "embl", "hgnc_symbol"), values = "*", mart = ensembl)

You can now simply match the genBank accession in your data with the genBank accession ids in dat.

dat[match(yourdata$genBank, dat$protein_id),]

I haven't tested this (not 100% sure your IDs will match), but if you want to search for what biomaRt has in the future use the listAttributes() function. I tend to write it to a data frame so I can search with grep() terms of interest.

x = listAttributes(ensembl)
x[grep("Genbank", x$description),]
ADD COMMENT
0
Entering edit mode

Thansks for your answer. But I try it, the result as follows:

values=c("AJ001495","AF339794")
dat=getBM(attributes=c("protein_id","embl","hgnc_symbol"),filters="protein_id",values=values, mart=ensembl)
dat
protein_id  embl        hgnc_symbol
<0 rows> (or 0-length row.names)

How can I deal with it ?

ADD REPLY
0
Entering edit mode

Have you tried with your complete list of IDs? Does it still return an empty data frame?

ADD REPLY
0
Entering edit mode

Yes, I deal with all my data, but the result is still empty. I have a question, my data is lncRNA genebank ID. Can I use "protein_id" as filters ? Look forward to your reply!

ADD REPLY
0
Entering edit mode

my data is lncRNA genebank ID. Can I use "protein_id" as filters

Think about that statement for a second and you will have your answer.

ADD REPLY
0
Entering edit mode

But I do not know how to choose the "filters" for my data.

ADD REPLY
0
Entering edit mode

I think your issue is as you suggested early, identifying the equivalent IDs in biomaRt, there is the real possibility that they aren't in there. In which case you will to use some other method, have you look into the file provided by genomax2

ADD REPLY
0
Entering edit mode
ADD COMMENT
0
Entering edit mode

Thansks for your answer.

ADD REPLY

Login before adding your answer.

Traffic: 2053 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6