Taxonomy information in nr database
1
0
Entering edit mode
6.6 years ago
navela78 ▴ 70

How is taxonomy information injected into BLAST databases?

My application logic is requiring me to rebuild nr from the fasta file (ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/) because I need to make some custom changes to the sequence headers:

In that file the headers does not seem to have taxonomy information other than the name of the taxonomy rank in brackets like this [Bacillus]. That doesn't seem to be enough to perform extractions using blasdbcmd like this

$ blastdbcmd -db nr -entry all -outfmt "%g %T" | \
   awk ' { if ($2 == 9606) { print $1 } } ' | \
   blastdbcmd -db nr -entry_batch - -out human_sequences.txt

There is an option called taxid_map in makeblastdb but where do I get the mapping file?

I guess a simpler way to ask my question is what command does NCBI use to make their nr database from the nr fasta file?

nr blast • 4.7k views
ADD COMMENT
2
Entering edit mode
6.6 years ago

You have to have to download the taxonomy database as well. You can use the update script for that:

update_blastdb.pl taxdb --decompress

Then ensure that blast can automatically access that information as well:

export BLASTDB=$BLASTDB:~/location/of/taxdb
ADD COMMENT
0
Entering edit mode

Thanks for your suggestion. Your suggestion works for preformatted nr database downloaded from NCBI. However I need to make the nr database from scratch from the fasta file (because I need to add some information to sequence headers in the nr fasta file).

My question is similar to this one how to makeblastdb with taxon id's

Any thoughts on this?

ADD REPLY
0
Entering edit mode

I believe that as long as the accession numbers are the same you'd get the same behavior, hence you would not need to do anything in particular. My expectation is that the blast TaxDB is indexed by accession numbers.

If you also wanted to build your own custom taxonomy - then you'd have to build it with makeblastdb as you suspect. The mapping files are at:

ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/

ADD REPLY
0
Entering edit mode

Also, there's a lot things in here. What should i download to get the [-taxid TaxID] [-taxid_map TaxIDMapFile] needed for makeblastdb?

ADD REPLY
0
Entering edit mode

Hello Istvan,

I would also like to do the blast with taxon information as well using the swissprot database however i haven't found detailed instruction of doing it. What I did first is to download the taxonomy database as per instruction but i got error.

perl update_blastdb.pl taxdb --decompress Connected to NCBI taxdb not found, skipping.

Can you help me with what should I do first because it seems like you know how to do this. Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 2143 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6