Is it possible to get taxonomy identifiers from diamond output without using --taxonmap during makedb?
1
0
Entering edit mode
4.9 years ago
O.rka ▴ 740

I want to run diamond and also get taxon identifiers for each hit. Is the only way to do this by incorporating it during the makedb step? Is there any other option? The reason why I'm asking is because I would have to ask IT to recreate the databases but it would be cool if I could do it post hoc. I have my own scripts going from taxon id to species/genus/etc but I still need the taxonomy identifier.

Command

source activate diamond_env
diamond blastp -f 6 qseqid sseqid pident nident length mismatch gapopen qstart qend sstart send evalue bitscore staxids stitle -o cylindrotheca.diamond.nr.blast6 -p 16 -d /usr/local/db/diamond/nr.dmnd -e 0.001 -q ./assembly_cylindrotheca/assembly.orf.faa"

Output

diamond v0.9.29.130 | by Benjamin Buchfink <buchfink@gmail.com>
Licensed under the GNU GPL <https://www.gnu.org/licenses/gpl.txt>
Check http://github.com/bbuchfink/diamond for updates.

#CPU threads: 16
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory:
Opening the database...  [0.040975s]
Error: Output format requires taxonomy mapping information built into the database (use --taxonmap parameter for the makedb command)

Can I use any of these files?

(diamond_env) -bash-4.1$ ls /usr/local/db/taxdb/taxdb_current/
taxdb.btd  taxdb.bti

(diamond_env) -bash-4.1$ ls /usr/local/scratch/METAGENOMICS/jespinoz/db/ncbi_taxonomy/
citations.dmp  delnodes.dmp  division.dmp  gc.prt  gencode.dmp  merged.dmp  names.dmp  nodes.dmp  readme.txt  taxdump.tar.gz
diamond blast protein alignment taxonomy • 7.3k views
ADD COMMENT
0
Entering edit mode

Have you solved this problem?

ADD REPLY
0
Entering edit mode

No I gave up. Have you?

ADD REPLY
3
Entering edit mode
4.1 years ago
Sej Modha 5.3k

In order to incorporate the taxonomy information into the diamond database, you'd have to use the prot.accession2taxid with nodes.dmp when you build the diamond database.

More info: http://www.diamondsearch.org/index.php?pages/command_line_options/

This can be achieved by using the following command:

 diamond makedb --in nr --db nr_diamond --taxonmap prot.accession2taxid --taxonnodes nodes.dmp --threads 20
ADD COMMENT
1
Entering edit mode

Where do you get this file: prot.accession2taxid?

Edit: wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz

ADD REPLY
1
Entering edit mode

The link above is out now. In case someone needs it, now the command line options can be found here: https://github.com/bbuchfink/diamond/wiki/3.-Command-line-options

The taxon nodes.dmp and names.dmp file can be retrieve with:

wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdmp.zip
ADD REPLY
1
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 2356 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6