I want to supply a taxID for any level of phylogeny and retrieve all of the accession numbers for organisms that fit. For example, a taxID of 1063 is species-level Rhodobacter sphaeroides and has around 7 strains. Is it possible to use efetch to retrieve the accession numbers for all of their genomes?
Retrieving the taxID from an accession number is straightforward with: curl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=*acc_number*&rettype=fasta&retmode=xml"
Granted, there's some grepping after the data comes back, but that's fine. I'm looking for something similar that will give back every accession number associated with the clade's tax ID.
Ideally, I would be able to include a taxID query into the eutils/efetch I have above. Is it possible to query by one of the fields returned by the above?
Since the above curl brings back data that includes taxID, could I query the nuccore database by the taxID instead of the accession number?
Does that make sense?
I did not find an automated solution to this, yet. I have resolved to download accession numbers from the NCBI site manually. Since I'm only after a handful of unchanging targets, this will suit my needs for now.