How to retrieve any and all NCBI/GenBank accession numbers from a Taxonomy ID?
2
0
Entering edit mode
8.1 years ago
yarmda ▴ 40

I want to supply a taxID for any level of phylogeny and retrieve all of the accession numbers for organisms that fit. For example, a taxID of 1063 is species-level Rhodobacter sphaeroides and has around 7 strains. Is it possible to use efetch to retrieve the accession numbers for all of their genomes?

Retrieving the taxID from an accession number is straightforward with: curl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=*acc_number*&rettype=fasta&retmode=xml"

Granted, there's some grepping after the data comes back, but that's fine. I'm looking for something similar that will give back every accession number associated with the clade's tax ID.

Ideally, I would be able to include a taxID query into the eutils/efetch I have above. Is it possible to query by one of the fields returned by the above?

Since the above curl brings back data that includes taxID, could I query the nuccore database by the taxID instead of the accession number?

Does that make sense?

ncbi efetch accession-number taxid • 4.6k views
ADD COMMENT
0
Entering edit mode

I did not find an automated solution to this, yet. I have resolved to download accession numbers from the NCBI site manually. Since I'm only after a handful of unchanging targets, this will suit my needs for now.

ADD REPLY
0
Entering edit mode
8.1 years ago
GenoMax 147k

See my answer in this post: Automatically Accessing all the sequences of a given order?
Since you want accession numbers add step 4a: Under "Summary" on left side of the page choose "Format" --> "Accession list".

ADD COMMENT
0
Entering edit mode

Thanks for this! While this is a solution, I'm trying to keep everything automated in a single script - so I don't think this is quite the solution I want.

ADD REPLY
0
Entering edit mode
8.1 years ago
Prasad ★ 1.6k

have you tried elink?

here is the example output for taxid you have mentioned

ADD COMMENT
0
Entering edit mode

What do the IDs in the output represent?

ADD REPLY
0
Entering edit mode

gi ids for all the entries for that particular taxaid in NCBI nucleotide database. You can change the database name accordingly, see here.

ADD REPLY

Login before adding your answer.

Traffic: 1498 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6