How is taxonomy information injected into BLAST databases?
My application logic is requiring me to rebuild nr from the fasta file (ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/) because I need to make some custom changes to the sequence headers:
In that file the headers does not seem to have taxonomy information other than the name of the taxonomy rank in brackets like this [Bacillus]. That doesn't seem to be enough to perform extractions using blasdbcmd like this
$ blastdbcmd -db nr -entry all -outfmt "%g %T" | \
awk ' { if ($2 == 9606) { print $1 } } ' | \
blastdbcmd -db nr -entry_batch - -out human_sequences.txt
There is an option called taxid_map in makeblastdb but where do I get the mapping file?
I guess a simpler way to ask my question is what command does NCBI use to make their nr database from the nr fasta file?
Thanks for your suggestion. Your suggestion works for preformatted nr database downloaded from NCBI. However I need to make the nr database from scratch from the fasta file (because I need to add some information to sequence headers in the nr fasta file).
My question is similar to this one how to makeblastdb with taxon id's
Any thoughts on this?
I believe that as long as the accession numbers are the same you'd get the same behavior, hence you would not need to do anything in particular. My expectation is that the blast TaxDB is indexed by accession numbers.
If you also wanted to build your own custom taxonomy - then you'd have to build it with
makeblastdb
as you suspect. The mapping files are at:ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/
Also, there's a lot things in here. What should i download to get the [-taxid TaxID] [-taxid_map TaxIDMapFile] needed for makeblastdb?
Hello Istvan,
I would also like to do the blast with taxon information as well using the swissprot database however i haven't found detailed instruction of doing it. What I did first is to download the taxonomy database as per instruction but i got error.
perl update_blastdb.pl taxdb --decompress Connected to NCBI taxdb not found, skipping.
Can you help me with what should I do first because it seems like you know how to do this. Thank you!