can't get staxids with blastn megablast against nt
1
0
Entering edit mode
8.0 years ago

Hi everyone, Please I really need help. I can't get the staxids with my blastn. This is what i run :

blastn -task megablast -query $RND_ASSEMBLY -db $BLASTDB/nt -evalue 1e-5 -num_threads 5 -max_target_seqs 1 -outfmt "6 qseqid staxids" -out $RND_ASSEMBLY.nt.1e-5.megablast

I downloaded the nt database with the update_blastdb script. In the same directory as my nt database i downloaded the two taxdb file. But despite this i always have this :

299958__len__337 N/A

If I change the parameters in outfmt with salltitles for example it's ok i have a good out file.. what can i do ? thanks so much for your help Marine

blast • 2.7k views
ADD COMMENT
0
Entering edit mode

have you downloaded taxdb? Now you have done the blast, using elink and efetch you can get the taxiD or species name.

ADD REPLY
0
Entering edit mode

Yes i have downloaded taxdb and the two files are in the same directory as my nt database :/ i don't know elink and efetch i will take a look. But no idea why the command -outfmt "6 qseqid staxids" returned N/A ? thanks

ADD REPLY
0
Entering edit mode

cant think of a reason why NA in your case. here is the elink method to get the species name in a crude form

for i in `awk -F "\t" '{if($1 != a){print $2}a=$1}' blast_file | cut -f 2 -d "|"`; do curl -s "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=protein&db=taxonomy&id=$i&rettype=docsum&retmode=text" | grep -A1 "<Link>" | perl -ne '{chomp; if($_=~/.*Id\>(\d+)/s){print $1,"\n";}}' | while read p; do  curl -s "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=taxonomy&id=${p}&rettype=docsum&retmode=txt" | perl -ne '{if($_=~/\d+.\s+(\w+\s+\w+)/){print '"$i"',"\t", '"$p"',"\t", $1,"\t",$2,"\n";}}'; done; done;
ADD REPLY
0
Entering edit mode
7.8 years ago

Hi all,

I've got the same problem and I finally find the solution. In the NCBI FTP, there are different level of directory where you can find nt database :

ftp://ftp.ncbi.nlm.nih.gov/blast/db/ & ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/

The second one, in the "FASTA" folder, permit to do a megablast, but does not communicate with taxdb file.

You have to download from the first ftp address (ftp://ftp.ncbi.nlm.nih.gov/blast/db/) where you will find files from "nt.00.tar.gz" to "nt.44.tar.gz". This database allow to retrieve taxonomic information when download the taxdb file in the same directory.

for file in ftp://ftp.ncbi.nlm.nih.gov/blast/db/nt*.tar.gz; do wget ${file}; done
ADD COMMENT

Login before adding your answer.

Traffic: 1701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6