blast returns staxid N/A and staxids 0
1
0
Entering edit mode
3.0 years ago

Hello guys,

I downloaded the NCBI files and created my local db. Everything seems to go well, but running the blastn -db /mnt/NTFS/NGS-DBs/NCBI-RefSeq/ViralSeq_2021-12-14 -query otus.fasta -evalue 1e-3 -word_size 11 -outfmt "6 staxid staxids" I obtain N/A and 0, respectively.

Here is the complete procedure I executed:

##Create the NCBI RefSeq viruses db

#Download sequences
genome_updater.sh -d "refseq" -g "viral" -c "all" -f "genomic.fna.gz" -o "all_virus_genomes" -t 4

#Create single fasta file containing all fasta sequences from *.fna.gz files
zcat *.fna.gz > viralseq_2021-12-14.fna

#Create BLASTN database
makeblastdb -in viralseq_20212-12-14.fna -dbtype nucl -title ViralSeq -input_type fasta -out ViralSeq_2021-12-14

Please, can someone help me to understand where is my mistake? How can I solve the problem? Thank you

blastn staxid • 1.6k views
ADD COMMENT
0
Entering edit mode
3.0 years ago

For the taxonomy options to work in blast you also need to download (separately) the taxonomy info. This is not included in the 'default' download.

Have a look in the blast documentation (or on the NCBI website) to check how to do this.

ADD COMMENT
0
Entering edit mode

Hello. Thank you very much for your help. I looked for the right file to download, but the NCBI repository looks like a jungle ... Please, can you suggest me which file to download? Thank you.

ADD REPLY
0
Entering edit mode

All needed info should be explained on this page: https://www.ncbi.nlm.nih.gov/sites/books/NBK569841/

ADD REPLY
0
Entering edit mode

Hello, as suggested by lieven.sterck I built the new db with the following command (I hope the nucl_gb.accession2taxid is the right file for mapping):

makeblastdb -in viralseq_2021-12-14_14-45-53.fna -dbtype nucl -title ViralSeq -input_type fasta -out ViralSeq_2021-12-14 -taxid_map /mnt/NTFS/NGS-DBs/KrakenViral/taxonomy/nucl_gb.accession2taxid
-parse_seqids


    Building a new DB, current time: 12/15/2021 17:39:45
    New DB name:   /mnt/NTFS/NGS-DBs/NCBI-RefSeq/ViralSeq_2021-12-14
    New DB title:  ViralSeq
    Sequence type: Nucleotide
    Keep MBits: T
    Maximum file size: 1000000000B
    Adding sequences from FASTA; added 1944 sequences in 0.534802 seconds.


emilio@Alienware:/mnt/NTFS/NGS-DBs/NCBI-RefSeq$ ll total 66472 drwxrwx--- 1 emilio emilio     4096 12月 15 17:41 ./ drwxrwx--- 1 emilio emilio     4096 12月 14 14:37 ../ drwxrwx--- 1 emilio emilio     4096 12月 14 15:33 all_virus_genomes/
-rw-r----- 1 emilio emilio 54217654 12月 14 15:42 viralseq_2021-12-14_14-45-53.fna
-rw-r----- 1 emilio emilio   131072 12月 15 17:41 ViralSeq_2021-12-14.ndb
-rw-r----- 1 emilio emilio   219943 12月 15 17:41 ViralSeq_2021-12-14.nhr
-rw-r----- 1 emilio emilio    23436 12月 15 17:41 ViralSeq_2021-12-14.nin
-rw-r----- 1 emilio emilio     7808 12月 15 17:41 ViralSeq_2021-12-14.nog
-rw-r----- 1 emilio emilio    38888 12月 15 17:41 ViralSeq_2021-12-14.nos
-rw-r----- 1 emilio emilio    23336 12月 15 17:41 ViralSeq_2021-12-14.not
-rw-r----- 1 emilio emilio 13357848 12月 15 17:41 ViralSeq_2021-12-14.nsq
-rw-r----- 1 emilio emilio    16384 12月 15 17:41 ViralSeq_2021-12-14.ntf
-rw-r----- 1 emilio emilio     7780 12月 15 17:41 ViralSeq_2021-12-14.nto

Then I repeated my blastn command, but nothing changed:

emilio@Alienware:~/TEST/Clustering/readsNotrRNA_filtered.fq.split$ blastn -db /mnt/NTFS/NGS-DBs/NCBI-RefSeq/ViralSeq_2021-12-14 -query otus.fasta -evalue 1e-3 -word_size 11 -outfmt "6 std staxid staxids" | more OTU_1;size=2620  AC_000019.1 89.362  47  5   0   12  58  15136   15182   6.43e-09    60.2    N/A 0 OTU_1;size=2620   NC_024150.1 86.667  45  6   0   3   47  14213   14257   3.87e-06    51.0    N/A 0 OTU_1;size=2620   NC_015225.1 91.667  36  3   0   11  46  14946   14981   3.87e-06    51.0    N/A 0 OTU_1;size=2620   AC_000018.1 86.667  45  6   0   11  55  15547   15591   3.87e-06    51.0    N/A 0 OTU_1;size=2620   NC_001876.1 85.417  48  6   1   3   50  14119   14165   1.39e-05    49.1    N/A 0 OTU_1;size=2620   NC_006879.1 86.842  38  5   0   12  49  14401   14438   6.48e-04    43.6    N/A 0 OTU_2;size=3184   NC_024150.1 74.059  239 52  9   60  293 14257   14024   1.41e-17    89.8    N/A 0

I also checked if, for example, the "AC_000019.1" is contained in the mapping file (nucl_gb.accession2taxid):

emilio@Alienware:/mnt/NTFS/NGS-DBs/KrakenViral/taxonomy$ grep AC_000019.1 nucl_gb.accession2taxid  
AC_000019   AC_000019.1 10522   56160914

It is so.

Where is my mistake?

Please.

Thank you

ADD REPLY
0
Entering edit mode

Use 101010 button to format code. First option (that you are using) is for quoting text.

Downloaded taxonomy files either need to be in the same folder as your index or in folder designated with $BLASTDB variable. This works with version 5 (-blastdb_version 5) of blast database which I assume you are using if you are using the latest blast+ package.

ADD REPLY

Login before adding your answer.

Traffic: 2121 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6