Question

Is it possible to get taxonomy identifiers from diamond output without using --taxonmap during makedb?

0

Entering edit mode

5.4 years ago

O.rka ▴ 740

I want to run diamond and also get taxon identifiers for each hit. Is the only way to do this by incorporating it during the makedb step? Is there any other option? The reason why I'm asking is because I would have to ask IT to recreate the databases but it would be cool if I could do it post hoc. I have my own scripts going from taxon id to species/genus/etc but I still need the taxonomy identifier.

Command

source activate diamond_env
diamond blastp -f 6 qseqid sseqid pident nident length mismatch gapopen qstart qend sstart send evalue bitscore staxids stitle -o cylindrotheca.diamond.nr.blast6 -p 16 -d /usr/local/db/diamond/nr.dmnd -e 0.001 -q ./assembly_cylindrotheca/assembly.orf.faa"

Output

diamond v0.9.29.130 | by Benjamin Buchfink <buchfink@gmail.com>
Licensed under the GNU GPL <https://www.gnu.org/licenses/gpl.txt>
Check http://github.com/bbuchfink/diamond for updates.

#CPU threads: 16
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory:
Opening the database...  [0.040975s]
Error: Output format requires taxonomy mapping information built into the database (use --taxonmap parameter for the makedb command)

Can I use any of these files?

(diamond_env) -bash-4.1$ ls /usr/local/db/taxdb/taxdb_current/
taxdb.btd  taxdb.bti

(diamond_env) -bash-4.1$ ls /usr/local/scratch/METAGENOMICS/jespinoz/db/ncbi_taxonomy/
citations.dmp  delnodes.dmp  division.dmp  gc.prt  gencode.dmp  merged.dmp  names.dmp  nodes.dmp  readme.txt  taxdump.tar.gz

diamond blast protein alignment taxonomy • 7.8k views

ADD COMMENT • link updated 18 months ago by katieostrouchov ▴ 30 • written 5.4 years ago by O.rka ▴ 740

0

Entering edit mode

Have you solved this problem？

ADD REPLY • link 5.2 years ago by yinbinqiu • 0

0

Entering edit mode

No I gave up. Have you?

ADD REPLY • link 5.2 years ago by O.rka ▴ 740

score 3 · Answer 1 · 2020-10-05

3

Entering edit mode

4.5 years ago

Sej Modha 5.3k

In order to incorporate the taxonomy information into the diamond database, you'd have to use the prot.accession2taxid with nodes.dmp when you build the diamond database.

More info: http://www.diamondsearch.org/index.php?pages/command_line_options/

This can be achieved by using the following command:

 diamond makedb --in nr --db nr_diamond --taxonmap prot.accession2taxid --taxonnodes nodes.dmp --threads 20

ADD COMMENT • link 4.5 years ago by Sej Modha 5.3k

1

Entering edit mode

Where do you get this file: prot.accession2taxid?

Edit: wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz

ADD REPLY • link 3.7 years ago by O.rka ▴ 740

1

Entering edit mode

The link above is out now. In case someone needs it, now the command line options can be found here: https://github.com/bbuchfink/diamond/wiki/3.-Command-line-options

The taxon nodes.dmp and names.dmp file can be retrieve with:

wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdmp.zip