Hello,
I recently downloaded and set up the nr database from NCBI using Diamond. I ran my sequences through using the taxonomic information tags. Using the following command lines:
diamond makedb --in nr.gz --taxonmap prot.accession2taxid.gz --taxonnodes nodes.dmp -d nr diamond blastp -d /srv/scratch/nrDatabase/nr.dmnd -q COG0202.faa --more-sensitive -o matchesCOG0202 -f 102 --id 50 --query-cover 80 -b 25
A significant portion of my sequences were returned as having the NCBI Taxonomy ID '2', for bacteria. When I run those same sequences through NCBI Web Blastp they are returned with very specific hits. Such as 'Deltaproteobacteria bacterium HGW-Deltaproteobacteria-15'. Why would Diamond give me useless results when NCBI Web gives me specific and useful results, especially when they use the same database?
Thank you in advance for any help!
I know you posted this 2 years ago but how did you get the prot.accesion2taxis.gz file and the nodes.dmp files to build the database? I'm trying to build a diamond nr database as well.