Hi,
I have downloaded 2500 genome assembly fasta file and changed them to database formatted files using "makeblastdb". When I blast my query on them using this command (I just show it for one genome file, GCA_000143925.2.fasta as an example) (I run blast for all in parallel):
blastn -query query.fasta -db GCA_000143925.2.fasta -outfmt "6 std qlen slen staxids sscinames" -task dc-megablast -out blast.out
I get this error message:
Warning: [blastn] Taxonomy name lookup from taxid requires installation of taxdb database with ftp://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz
(I have downloaded and unzipped taxdb.tar.gz file and added the path to my system .bashrc (export TAXDB=/home/manighanipoorsamami/local/taxdb
), but still get the same error.
Also here is the first hit line of blast.out file:
ERV2-1_H.orn#LTR/ERV GL380075.1 81.633 49 7 1 7855 7901 77025 77073 0.018 46.4 8632 90134 0 N/A
as you see it does not give staxids sscinames.
Then, I created the file "taxid_mapping_file.txt" by adding taxid to genome name:
cat taxid_mapping_file.txt
GCA_000143925.2.fasta 135651
The, ran this:
makeblastdb -in GCA_000143925.2.fasta -taxid_map taxid_mapping_file.txt -parse_seqids -dbtype nucl
but got this error:
Building a new DB, current time: 01/25/2024 16:35:44
New DB name: /home/manighanipoorsamami/New_PhD_program/HTT_sea_snake/test/GCA_000143925.2.fasta
New DB title: GCA_000143925.2.fasta
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 3000000000B
Adding sequences from FASTA; added 3305 sequences in 1.22134 seconds.
Error: [makeblastdb] No sequences matched any of the taxids provided.
Can you please help me resolve this and run a blast command that can add "staxids" and "sscinames" to blast output?
I could not find anything in NCBI blast manuals.
Cheers,
Mani