Hi everyone, I am trying to limit my local blastp search against the nr database to only bacteria, so I can later get the best hit for each species. To do that, I can get the taxonomy ID for all bacterias and use the -taxidlist parameter to perform a local blastp. However, I realised that my taxonomy ID file, containing bacterial tax IDs from NCBI has "duplicated" taxonomy identifiers for different strains of the same species. For example, the taxonomy ID for Clostridium acetobutylicum is 1488, but there are other IDs that point to the same species (Clostridium acetobutylicum str. ATCC 824 - 272562 // Clostridium acetobutylicum str. DSM 1731 - 991791 // Clostridium acetobutylicum str. EA 2018 - 863638).
I would like to get a file with the tax ID for all species (e.g. Vibrio parahaemolyticus - 670, but not all the strains associated with it (e.g. 2082734, 2082733, 2082732, ... and the other 254 strains annotated in the Taxonomy database (https://www.ncbi.nlm.nih.gov/taxonomy/?term=txid670[Subtree])). Manual checking is out of the table so, is there any way to accomplish that?
Thanks