I swear this question has been asked and never satisfyingly answered for over a decade.
I know the simplest answer is to perform a bunch of Entrez queries, but to quote many infomercials, "There's got to be a better way."
Here's the setup, I have a file of straight accession numbers extracted from a BLAST search. I want to convert these to full taxonomies. i.e.
GCA_000005845.2 --> Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia
Or something similar to that. Is there any approach to this that can be in bulk? I have a copy of the BLAST taxonomy file, but that seems to only be useful if applied during a BLAST search, do I just have to re-do all my searches with taxonomy specified?
Is there a general database? My accessions have a huge number of prefixes.
What did you blast against to get these accessions? Most everything that is not a genome/assembly (i.e. non G* numbers) should be covered by the main nucleotide database (
nuccore
).Also available as python package (Biopython). e.g.