Are all of your sequences from NCBI? If so you can use Entrez to download the taxonomy IDs associated with your IDs.
# get the IDs
grep '>' your_database.fasta | cut -f 1 -d ' ' > all_ids.txt
# query entrez nucleotide database - your data may be from somewhere else!
cat all_ids.txt | efetch -db nuccore -format docsum | xtract -pattern DocumentSummary -element AccessionVersion,TaxId > taxids_for_blast.txt
You can install efetch and xtract via conda:
conda install -c bioconda entrez-direct
You can also pull out genus and species names from your sequence names and hope for the best, but that will bite you when you have subspecies or cf. or sp. names. Below I assume that you always have only one species name ('Homo sapiens') but there are many sequences that have 'several' names like "Bacillus sp. blabla -five" in which case the below code breaks, but it's faster than querying Entrez above.
Here I use awk to put a tab between the sequence ID and the genus and species names. Using taxonkit installable like entrez-direct above.
grep '>' your_database.fasta | awk '{print $1"\t"$2" "$3}' | taxonkit name2taxid -i 2 > names_and_taxids.txt
cut -f 1,3 -d "\t" names_and_taxids.txt > taxids_for_blast.txt
The taxonkit way is substantially faster if you have thousands of sequences but will break easily.
@Phillip Thank you for your answer. When I used the cut command, I got the error shown below.
I couldn't solve it. Could you help me?
Ah I see, instead of typing backslash t, hit CONTROL V and hit the tab key once - it treats the backslash and the t as two separate characters, hitting CONTROL V and Tab
See this SO answer: https://unix.stackexchange.com/a/35370
It worked well. Thank you! When I created the database using extracted ids, I got the following error:
Could you help me to solve this error?
It works now. I removed the > symbol.