how to makeblastdb with taxon id's
0
0
Entering edit mode
7.2 years ago
gb ★ 2.2k

Dear all,

I extracted a subset of sequences from the nt database and I want to make a blast database with those sequences including the taxon id's.

If I execute the following command :

./ncbi-blast-2.6.0+/bin/makeblastdb -in ntselection.fa -dbtype nucl -taxid_map nucl_gb.accession2taxid -parse_seqids

I get no errors and the database is working with blast, but with output parameter -outfmt '6 qseqid sseqid stitle sgi sacc pident length qlen evalue bitscore staxids' there are no taxon id's in the output.

What is the correct command and input files? If I use the gi_taxid_nucl.dmp I also get no errors but no blast database is made.

makeblastdb blast • 4.8k views
ADD COMMENT
1
Entering edit mode

If you use the search, you'll find out that this has been asked before and at least back then there was no direct solution. However, post blast it's very easy to add that information to your output. Just use join and sort, e.g.:

join -t $'\t' -1 1 -2 1 -o 1.1,1.2,1.3,...,2.2 \
    <(sort -t $'\t' -k1,1 blastoutput) \
    <(sort -t $'\t' -k1,1 nucl_gb.accession2taxid)
ADD REPLY
0
Entering edit mode

It sounded that upgrading to 2.6.0 was the solution. If I want to use your solution I have to adjust someone else's pipeline and was trying to avoid that. According the blast documentation it should be possible.

Anyways thanks for the answer

ADD REPLY
0
Entering edit mode

Have you tried using blast taxdb provided on the blastdb website? Ref: https://www.ncbi.nlm.nih.gov/books/NBK279680/

ADD REPLY
0
Entering edit mode

The taxdb is to retrieve the scientific name from the taxid. I need the taxid in my blast output. In the last column I now only see 'N/A'. But I do know that there are taxid's available because I can find them in the file nucl_gb.accession2taxid (ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/)

ADD REPLY
0
Entering edit mode

I agree and taxonomy IDs exist in the pre-formatted blast databases but for custom db, it might be trickier to incorporate taxonomy IDs. Please check your blast version: How to make a custom blast db with taxon IDs from a taxid_map file

ADD REPLY
0
Entering edit mode

I am aware of that post and already using blast 2.6.0 and it is still not working. I do not get errors, if I use the same format taxon map file as in that post and use blast 2.6.0 I do not get errors but there is also no database made. I think I go try blast 2.3.0 with one of the solutions

ADD REPLY
0
Entering edit mode

I have some weird results now, I use one sequence to make the blast database for test purposes.

If I use an accession_taxonid file consisting of one line (the accession and taxon id of that single sequence) it works!

But, if I use the same makeblastdb command with the complete accession_taxonid file there is no database made... Maybe the length and order of the accession_taxonid file must be the same as the input sequences.

ADD REPLY

Login before adding your answer.

Traffic: 2614 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6