blast retrieves "N/A" for taxonomic data
0
0
Entering edit mode
4.0 years ago
langziv ▴ 70

Hello.

I'm trying to get taxonomic data, such as scientific and common names, and keep getting "N/A" for each taxonomic parameter. I updated the taxdb. It's in the same directory as the nucleotides database. I also tried specifying the path, as suggested in previous posts on similar issues.

The script is

export BLASTDB="/.../biodb/BLAST/nucleotide3/taxdb"

module load blast/blast-2.10.0

cd /.../output/blast

for file in ./*.fa; do \
  output=${file#"./scaffold_"}
  output=${output%.fa}
  output=${output}_blast.txt  
  blastn -query $file \
  -db /bioseq/biodb/BLAST/nucleotide3/nt \
  -max_hsps 1 -max_target_seqs 1 -num_threads 20 \
  -out $output \
  -outfmt "6 qseqid sseqid pident staxids sscinames scomnames qstart qend length sstart send slen evalue mismatch gapopen bitscore"
done

Here's as example of one line form an output file:

scaffold_11 gi|1530013355|ref|XM_010738424.3| 97.059 215358 N/A N/A 68936 68969 34 1372 1339 2896 0.11 1 0 58.4

As can be seen data is retrieved except for where the taxonomic data is expected.

Thanks!

blast blastn • 1.8k views
ADD COMMENT
0
Entering edit mode

NCBI taxonomy is notoriously inaccurate and incomplete. Are you sure this data exists for this entry? Have you verified any other way?

ADD REPLY
0
Entering edit mode

Thank you for the reply. The "N/A" is in every line in the results. There's not a single line that contains taxonomic data other that the taxid. It's weird because the database in on the computer I work with, not an NCBI server, at the same path as the nucleotides database. I'm guessing it's some bug in the blast software.

I thought I could try to find a software that allows retrieving such data after providing the daxid.

ADD REPLY
0
Entering edit mode

Is it this line?: export BLASTDB="/.../biodb/BLAST/nucleotide3/taxdb"

That line/filepath looks malformed to me? Starts with /.../

ADD REPLY
0
Entering edit mode

That's not the whole line. I replaced part of it with "...".

ADD REPLY
0
Entering edit mode

Hey, I am having a similar issue. Was there ever a resolve to this issue?

ADD REPLY
0
Entering edit mode

Most likely the DB was built without the correct flags, it should be something like

makeblastdb -dbtype nucl -in the_database.fasta -taxid_map the_database_taxids.txt -parse_seqids

the_database_taxids.txt contains the NCBI taxonomy IDs for each ID in the database.

For example, for the Fasta file

>one hello_more_stuff
AGCA

the taxid file would be

one 1234

If you build the database without the taxid_map you will get N/A for taxonomy ID and details

ADD REPLY
0
Entering edit mode

Thank you for your reply, I built the database in the following manner attaching the taxid_map files, as shown below. I also am running BLAST in the same directory holding both the blast indexes and the taxid map file. The FASTA file is the NCBI DB from 2020 with its corresponding taxid map file. I have also tried using more recent pre-made blast dbs: https://ftp.ncbi.nlm.nih.gov/blast/db/, but similar errors develop.

Is there some curation I have to do with the Taxid Map file in order to use this?

 > makeblastdb -in 2nt.20200510.fasta -dbtype nucl -title 2nt -out 2nt -blastdb_version 5 -parse_seqids -taxid_map nucl_gb.accession2taxid

I am also attaching my Blast command. As you can imagine, "N/A" results are showing in place of the taxid. And during the run I see, " Warning: [blastn] Taxonomy name lookup from taxid requires installation of taxdb database with ftp://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz". So it's not locating the taxid map file, but why?

> blastn -task megablast -query Hep_WGS19_291.downsampled-100000.fasta -db 2nt -max_target_seqs 50 -num_threads `nproc` -outfmt "6 qseqid sacc stitle staxids sscinames sskingdoms qlen slen length pident qcovs evalue" -out "2smaller_nt_test.tsv"

I really would appreciate any feedback!

ADD REPLY
1
Entering edit mode

aaah OK, it cant find the taxdb in your case. So the taxonomy IDs have been added correctly, but blast can't look them up anywhere.

There are several places blast looks for the taxdb, including your current working directory. have you tried downloading the taxdb.tar.gz and then extracting its contents into your working directory? That might fix it

ADD REPLY
0
Entering edit mode

This worked, thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2586 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6