Difficulty retrieving NCBI taxonomic IDs from multiple accession IDs
1
0
Entering edit mode
5.7 years ago
Ming ▴ 110

Dear All,

I am trying to extract taxids from a file containing NCBI accession IDs; I have about 20,000 accession IDs.

I have tried using this link to help me: 1.) A: NCBI Accession Number to Taxonomy ID

This did not work and I came up with this error message: ERROR in fetch input: Search Backend failed: read request has timed out. peer: 130.14.18.27:7011

2.) I have heard of the file: ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/nucl_gb.accession2taxid.gz. I tried to perform some grep functions but to no avail, and I could not find a script that helps parse thousands of accession IDs.

Would really appreciate if anyone could help.

Thanks!

ncbi • 2.4k views
ADD COMMENT
0
Entering edit mode

See my comment here: C: NCBI Accession Number to Taxonomy ID

This appears to be a local issue with your firewall restrictions.

ADD REPLY
1
Entering edit mode
5.7 years ago
GenoMax 147k

2.) I have heard of the file: ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/nucl_gb.accession2taxid.gz. I tried to perform some grep functions but to no avail, and I could not find a script that helps parse thousands of accession IDs.

If you downloaded that file then you can easily find taxID using grep. Your accessions are will be in file called acc.

$ more acc
X68822
Z18640
Z18643
Z18642
Z18647
Z18649
Z18665
Z18651
X02323
Z18653
X59440
Z18654
X56823
X56218
X68287

$ for i in `cat ./acc`; do zgrep -m1 -w  "$i" nucl_gb.accession2taxid.gz; done
X68822  X68822.1        9731    1118
Z18640  Z18640.1        9731    1121
Z18643  Z18643.1        27615   1128
Z18642  Z18642.1        27615   1129
Z18647  Z18647.1        27610   1130
Z18649  Z18649.1        27611   1135
Z18665  Z18665.1        27613   1137
Z18651  Z18651.1        27616   1151
X02323  X02323.1        9887    1160
Z18653  Z18653.1        9773    1161
X59440  X59440.1        9668    1163
Z18654  Z18654.1        27617   1164
X56823  X56823.1        9886    1165
X56218  X56218.1        452646  1168
X68287  X68287.1        9940    1194

Third column contains the taxID. I will leave it to you as to how to get only the accession and taxID (hint: use cut or awk).

ADD COMMENT
1
Entering edit mode

Alternatively if you cannot find your IDs you can look into the files prefixed dead_ that keep a trace of old ids that either don't exists anymore or have been changed.

I had to do something similar a few months ago and I found almost all acc IDs in the files listed here https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/

Take a look at the README in this same folder

After looking into several files I still had a few acc IDs I had to look for manually though (less than 20 so it wasn't a hassle)

ADD REPLY
0
Entering edit mode

Thank you very much for your advice!

ADD REPLY

Login before adding your answer.

Traffic: 1742 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6