Hi,
I am trying to extract the sequences from the NCBI nt blast database. For this the blastdbcmd
needs the species level taxIDs and it tells me to use
get_species_taxids.sh
. So that's what I did. Since I wanted all the bacterial sequences I used 2 as the highest taxID and thought that the following command would give me only bacterial species IDs
get_species_taxids.sh -t 2 > bactarial_taxIDs.txt
However, I have also order and genus IDs along with the species IDs so the following command does not work.
blastdbcmd -db ncbi_nt_db -dbtype nucl -taxidlist bacterial_taxIDs.txt -out bacteria.fa -outfmt "%f"
Error: [blastdbcmd] Taxonomy ID(s) not found. This could be because the ID(s) provided are not at or below the species level. Please use get_species_taxids.sh to get taxids for nodes higher than species (see https://www.ncbi.nlm.nih.gov/books/NBK546209/).
Is there a way to keep only species level IDs? Thanks.
Danil
So that script is not useful for your purposes then.
So "higher" means genus and so on in this case?
I grabbed a random set of taxID from
get_species
command and was able to run theblastdbcmd
.Do you have the taxonomy files available in the same directory where you have your
nt
index files?