GeneBank accession 2 Entrez gene id
2
1
Entering edit mode
9.1 years ago
盛夏 ▴ 10

I have a list id of GenBank accession like this:

NM_176890.1
BX101169.1
BM716958.1
NM_173652.1
AI808125.1
AA972198.1
AA235299.1
NM_003444.1
NM_012464.3

and I want to change it to entrez gene id, I tried DAVID, but failed, and BioMart, and others failed to. Can anyone advise some ways to me to finish this aim?

Thanks very much.

NCBI Genbank • 6.9k views
ADD COMMENT
1
Entering edit mode

Note that most genbank accession numbers will not be directly linked to an Entrez Gene ID. That said, when you tried to do the conversions using online tools, what "failed"?

ADD REPLY
0
Entering edit mode

you mean like the genebank accession AA235299 is a EST, so it is not a gene, can't be mapped?

ADD REPLY
0
Entering edit mode

If you open these ID's in GenBank,the database itself contain link to entrez. For eg. open the site http://www.ncbi.nlm.nih.gov/genbank/. Paste your query like NM_176890 in the search text box and you will get the entire information (http://www.ncbi.nlm.nih.gov/nuccore/NM_176890 ). Here if you search Geneid it will highlight it like this "GeneID:259296"

ADD REPLY
0
Entering edit mode

but I need to change a lot of the GenBank accessions to entrez gen ids once a time. I cant query it one by one

ADD REPLY
2
Entering edit mode
9.1 years ago
Guangchuang Yu ★ 2.6k

you can use clusterProfiler to convert gene IDs. see http://www.bioconductor.org/packages/3.2/bioc/vignettes/clusterProfiler/inst/doc/clusterProfiler.html#bitr-biological-id-translator

> acc=read.table("acc.txt", header=F, stringsAsFactors=F)
> acc
           V1
1 NM_176890.1
2  BX101169.1
3  BM716958.1
4 NM_173652.1
5  AI808125.1
6  AA972198.1
7  AA235299.1
8 NM_003444.1
9 NM_012464.3
> acc=gsub("\\.\\d", "", acc[,1])
> acc
[1] "NM_176890" "BX101169"  "BM716958"  "NM_173652" "AI808125"  "AA972198"
[7] "AA235299"  "NM_003444" "NM_012464"
> require(clusterProfiler)
> bitr(acc, fromType="ACCNUM", toType="ENTREZID", annoDb="org.Hs.eg.db")
     ACCNUM ENTREZID
1 NM_176890   259296
4 NM_173652   285154
8 NM_003444     7710
9 NM_012464     7092
Warning message:
In bitr(acc, fromType = "ACCNUM", toType = "ENTREZID", annoDb = "org.Hs.eg.db") :
  55.56% of input gene IDs are fail to map...
>

Some of the accession numbers your listed are not genes, and can't be mapped to entrez gene ID.

You can search them to verify, for example the AA235299.1 can be found in http://www.ncbi.nlm.nih.gov/nucest/1859736/.

ADD COMMENT
0
Entering edit mode

ok, I know it thanks very much~

ADD REPLY
0
Entering edit mode

Hi! How do we use this for accessions representing multiple organisms, especially non-model ones? for example: XP_021371371.1 and PIK53657.1

ADD REPLY
0
Entering edit mode

At present, when using annoDb it raises an error: Error in bitr(acc, fromType = "ACCNUM", toType = "ENTREZID", annoDb = "org.Hs.eg.db") : unused argument (annoDb = org.Hs.eg.db), therefore, it should be replaced with OrgDb = org.Hs.eg.db.

ADD REPLY
0
Entering edit mode
9.1 years ago
pevsner ▴ 420

You can copy and paste these 9 accession numbers into NCBI Nucleotide and see that five are from the EST database. Of the four that are RefSeq accessions, two have been updated and two have had their records removed.

ADD COMMENT
0
Entering edit mode

then, what should I to deal with the accession that has already removed or from the EST? just abandon it?

ADD REPLY
0
Entering edit mode

For ESTs, though it is a bit old-fashioned, you could look at UniGene, which maps ESTs to "clusters". Many of those clusters are, then, mapped to Entrez Gene IDs.

ADD REPLY

Login before adding your answer.

Traffic: 1476 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6