Hello,
to decontaminate my de-novo-genome I'd like to blast my scaffolds against the core_nt
-database of NCBI (got the preformatted one). Unfortunately, using this huge database I constantly run out of time on my cluster even if I split the genome file to smaller pieces. To solve this problem I tried to use the -taxids
option in the blastn command to just use single species within core_nt
, but for some reason it doesn't work (still blasts against the whole database hitting more than the selected taxIDs).
A warning message looks like that: "The -taxids command line option requires additional data files. Please see the section 'Taxonomic filtering for BLAST databases' in https://www.ncbi.nlm.nih.gov/books/NBK569839/ for details."
Here it says "If you are using your own BLAST database(s) and would like to take advantage of this feature, you must set the taxonomy IDs in your database(s) and can get the taxonomy4blast.sqlite3 database by downloading https://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz , decompressing it and installing it alongside your other BLAST database(s)."
Those files I have, they came with the preformatted database and are in the same folder. Also, sqlite3
is in my conda environment. What am I missing?
Here is my command:
assembly="path_to_assembly"
database_used="path_to_database_folder"
taxIDs_used="185587,239422,9606"
thread_number=8
out_name="path_to_hitfiles_folder/hitfile"
blastn \
-task blastn \
-db $database_used \
-taxids $taxIDs_used \
-query $assembly \
-outfmt "6 qseqid staxids bitscore sseqid pident length mismatch gapopen qstart qend sstart send evalue" \
-max_target_seqs 1 \
-max_hsps 1 \
-evalue 1e-28 \
-num_threads $thread_number \
-mt_mode 0 \
-out $out_name
Thank you in advance for your help!
Thank you for the hint how to use the
-taxids
option to make few-species databases withmakeblastdb
andblastdb_aliastool
. This might be very helpfull.