RefSeq ID to TaxID?
2
0
Entering edit mode
6.7 years ago
bird77 ▴ 80

I have a number of RefSeq IDs (like this here NZ_LFWC01000004.1) and I would like to get the tax IDs for the species.

Is there a way to automatize that in Bash, Python or R?

genome • 6.0k views
ADD COMMENT
1
Entering edit mode

If you have a very long list of RefSeq IDs, you might want to do a local search vs "accession2taxid" files, see : Biostars

ADD REPLY
0
Entering edit mode

Wonderful, that is exactly what I was looking for. Thank you so much.

ADD REPLY
0
Entering edit mode

Python and entrez module ?

ADD REPLY
4
Entering edit mode
6.7 years ago
Sej Modha 5.3k

Using NCBI Unix eutilities:

esearch -db nucleotide -query "NZ_LFWC01000004.1"|esummary|xtract -pattern TaxId -element TaxId
ADD COMMENT
0
Entering edit mode

Any idea about this error?

$  esearch -db nucleotide -query "NZ_LFWC01000004.1"|esummary|xtract -element TaxId

ERROR: No -pattern in command-line arguments

xtract seems to cause the error.

ADD REPLY
0
Entering edit mode

Try this:

esearch -db nucleotide -query "NZ_LFWC01000004.1"|esummary|xtract -pattern TaxId -element TaxId
ADD REPLY
0
Entering edit mode

ah, wonderful, thank you very much. :-)

ADD REPLY
2
Entering edit mode
6.7 years ago

In python :

from Bio import Entrez
from Bio import SeqIO
key_list=['NZ_LFWC01000004.1'] ###Add all your IDs

for key in key_list:
    Entrez.email = "myemailaddress"
    handle = Entrez.efetch(db='nucleotide', id=key, rettype='gb')
    record = SeqIO.read(handle,'genbank')
    if record.features[0].qualifiers['db_xref'][0].split(":")[0] == 'taxon':
        print(record.features[0].qualifiers['db_xref'])[0].split(":")[1]
ADD COMMENT
1
Entering edit mode

I guess one has to be careful with the db_xref tag, it can often contain identifiers linking to other databases such as UniProtKB.

ADD REPLY
1
Entering edit mode

Yes right, I added a condition to keep 'taxon' id from NCBI's taxonomic identifier only

ADD REPLY

Login before adding your answer.

Traffic: 1974 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6