Which additional files are needed for the -taxids option in blast?
2
0
Entering edit mode
13 days ago

Hello,

to decontaminate my de-novo-genome I'd like to blast my scaffolds against the core_nt-database of NCBI (got the preformatted one). Unfortunately, using this huge database I constantly run out of time on my cluster even if I split the genome file to smaller pieces. To solve this problem I tried to use the -taxids option in the blastn command to just use single species within core_nt, but for some reason it doesn't work (still blasts against the whole database hitting more than the selected taxIDs).

A warning message looks like that: "The -taxids command line option requires additional data files. Please see the section 'Taxonomic filtering for BLAST databases' in https://www.ncbi.nlm.nih.gov/books/NBK569839/ for details."

Here it says "If you are using your own BLAST database(s) and would like to take advantage of this feature, you must set the taxonomy IDs in your database(s) and can get the taxonomy4blast.sqlite3 database by downloading https://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz , decompressing it and installing it alongside your other BLAST database(s)."

Those files I have, they came with the preformatted database and are in the same folder. Also, sqlite3 is in my conda environment. What am I missing?

Here is my command:

assembly="path_to_assembly"
database_used="path_to_database_folder"
taxIDs_used="185587,239422,9606"
thread_number=8
out_name="path_to_hitfiles_folder/hitfile"

blastn \
 -task blastn \
 -db $database_used \
 -taxids $taxIDs_used \
 -query $assembly \
 -outfmt "6 qseqid staxids bitscore sseqid pident length mismatch gapopen qstart qend sstart send evalue" \
 -max_target_seqs 1 \
 -max_hsps 1 \
 -evalue 1e-28 \
 -num_threads $thread_number \
 -mt_mode 0 \
 -out $out_name

Thank you in advance for your help!

decontamination taxids blastn • 343 views
ADD COMMENT
1
Entering edit mode
13 days ago
JustinZhang ▴ 130

See previous topic here

ADD COMMENT
0
Entering edit mode

Thank you for the hint how to use the -taxids option to make few-species databases with makeblastdb and blastdb_aliastool. This might be very helpfull.

ADD REPLY
1
Entering edit mode
13 days ago
GenoMax 150k

Using the taxids option filters the BLAST search results, which come from entire database. That option does not pre-filter the BLAST database up front before doing the search.

If you need only a certain set of taxid's then you should extract those sequences from core_nt using blastdbcmd and the build a new local database of just those sequences. Use the custom taxID file as shown in: https://www.ncbi.nlm.nih.gov/books/NBK569841/ with that local database.

ADD COMMENT
0
Entering edit mode

Ok, so I was wrong about what taxids does. Thanks a lot for the clarification! I will try what you suggest, this sounds great.

ADD REPLY

Login before adding your answer.

Traffic: 3379 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6