Diamond and Taxonomic information
0
0
Entering edit mode
2.5 years ago

Dear all, I am going crazy...hopefully someone can help me.

I have done the following steps to create the diamond RefSeq protein db:

Download data:

wget -c ftp://ftp.ncbi.nlm.nih.gov/refseq/release/complete/complete.nonredundant_protein*.faa.gz
wget -c ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz
wget -c ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdmp.zip [then unzipped]

Of course, I verified the integrity of all files.

Create the diamond db with taxonomic information:

zcat *.faa.gz | diamond makedb --taxonmap prot.accession2taxid.gz --taxonnames taxDMP/names.dmp --taxonnodes taxDMP/nodes.dmp -d refseq_protein_nonredund_diamond -v --log

The database has been successfully created:

The database will contain:

Database sequences                   188429555
Database letters                     65043307720
Accessions in database               188429555
Entries in accession to taxid file   1045057754
Database accessions mapped to taxid  188339268
Database sequences mapped to taxid   188339268
Database hash                        0987f8827e574693df8aa10181456276
Total time                           4712s

Then, I downloaded to "dummy-well-known" fasta files of viruses [I repeated the procedure for both HPV and Herpes] to check: 1. the blastx operation; 2. the taxonomy association:

Download virus fasta:

efetch -dbnuccore -id NC_027779.1 -format fasta > HumanPapillomaVirus.fasta

Blastx operation:

diamond blastx -d /home/emilio/DiamondDB/refseq_protein_nonredund_diamond.dmnd -p 16 -q HumanPapillomaVirus.fasta -f 6 qseqid staxids bitscore sseqid pident length mismatch gapopen qstart qend sstart send evalue -o HumanPapillomaVirus_DiaMond.out -t /tmp -c 4 -b 0.77 -k 1 -v --log

Blastx result:

emilio@frodo:~/viruses$ head HumanPapillomaVirus_DiaMond.out
NC_027779.1 1773    503 WP_215534908.1  48.4    541 220 8   5246    6856    1   486 3.71e-156

Then, I used the "staxid" as argument for taxonkit:

taxonkit --data-dir /home/emilio/DiamondDB/taxDMP/ list list --show-rank --show-name --indent "    " --ids 1773

Obtaining a long list of TAXA, such as:

1773 [species] Mycobacterium tuberculosis
    1765 [biotype] Mycobacterium tuberculosis variant bovis
        33892 [no rank] Mycobacterium tuberculosis variant bovis BCG
            410289 [strain] Mycobacterium tuberculosis variant bovis BCG str. Pasteur 1173P2
            413996 [strain] Mycobacterium tuberculosis variant bovis BCG str. Moreau RDJ
            561275 [strain] Mycobacterium tuberculosis variant bovis BCG str. Tokyo 172
            717522 [strain] Mycobacterium tuberculosis variant bovis BCG str. Mexico
            998089 [strain] Mycobacterium tuberculosis variant bovis BCG str. China
            998090 [strain] Mycobacterium tuberculosis variant bovis BCG str. ATCC 35733
            998091 [strain] Mycobacterium tuberculosis variant bovis BCG str. ATCC 35740
            998092 [strain] Mycobacterium tuberculosis variant bovis BCG str. ATCC 35743
            1195237 [strain] Mycobacterium tuberculosis variant bovis BCG str. Frappier
            1195465 [strain] Mycobacterium tuberculosis variant bovis BCG str. Glaxo
            1195466 [strain] Mycobacterium tuberculosis variant bovis BCG str. Moreau
            1195467 [strain] Mycobacterium tuberculosis variant bovis BCG str. Phipps
            1195468 [strain] Mycobacterium tuberculosis variant bovis BCG str. Prague
            1195469 [strain] Mycobacterium tuberculosis variant bovis BCG str. Sweden
            1206780 [strain] Mycobacterium tuberculosis variant bovis BCG str. Korea 1168P

Of course, the taxonomic association is wrong (same wrong result for Herpes Virus)

I checked the 1773 taxid from Uniprot, obtaining the same result as taxonkit:

Mnemonic i  MYCTX
Taxon identifier i  1773
Scientific name i   Mycobacterium tuberculosis
Taxonomy navigation 
Up › Mycobacterium tuberculosis complex
Down All lower taxonomy nodes (2,945)
Common name i   -
Synonym i   -
Other names i   ›"Bacillus tuberculosis" (Zopf 1883) Klein 1884
›"Bacterium tuberculosis" Zopf 1883
›"Mycobacterium tuberculosis typus humanus" Lehmann and Neumann 1907
›"Mycobacterium tuberculosis var. hominis" Bergey et al. 1934
›ATCC 25618

Please, can someone help me to understand my fault?

Thank you very much Emilio

diamond taxids taxonkit taxonomy • 1.2k views
ADD COMMENT
0
Entering edit mode

No one can help me? Any idea? Thanks

ADD REPLY

Login before adding your answer.

Traffic: 1984 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6