Hi, I'm trying to do an rpsblast using ncbi-blast-2.5.0+. I have a file containing gi numbers as my query and my command line is below.
rpsblast -query t -db results/Cog/Cog -out rps-blast.out -evalue 1e-2 -outfmt 6
Oddly the rpbslast keeps giving me this error:
Warning: [rpsblast] Error initializing remote BLAST database data loader: Protein BLAST database 'Cog/Cog nr' does not exist in the NCBI servers
My query file (t) looks like this. I've tried to remove search using just the numbers (292833481) but it still doesn't solve the problem. gi|292833481 gi|383341230 gi|289693981
However, when I try to the same search but using a fasta file as query, it runs fine and gives the results I need.
rpsblast -query GCF_000005845.2_ASM584v2_protein.faa -db results/Cog/Cog -out rps-blast.out -evalue 1e-2 -outfmt 6
Is there something that I did wrong here? What is the correct way to format query a list of gi's for blast? Thanks!
NCBI has stopped using
gi
numbers externally since September 2016. You should substitute thegi
numbers withaccession numbers
.This is probably THE correct answer to this question.
Thank you for your response! Unfortunately, I didn't see any improvement when I converted my query from gi number to accession (eg EFL06024.1). The error regarding "Protein BLAST database 'Cog/Cog nr' does not exist in the NCBI servers" still there....
How old is your
rps-blast
? From https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMDWeb&PAGE_TYPE=BlastNews:NCBI is hiding GI numbers from inexperienced kids now, but you can still use them in blast queries and eutils. (see below)
I could reproduce your problem using a valid identifier, maybe it is a bug? Valid identifier worked online.
Fast solution is download the sequences of interest, and use a fasta file.
edit: are you using
rpsblast
orrpsblast+
?Hi h.mon, I'm using rpsblast+ from blast+ package 2.5.0 and 2.6.0(latest). Yes, I suppose downloading fasta files will be the quickest solution, although my queries are rather large (>700K). NCBI's CD-search accept gi/accession number as query (https://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi) but they only allow 4000 queries each time :/ I've emailed ncbi-help and will update accordingly. Thanks for your help!
Shouldn't you edit your post then? If I try
rpsblast
instead ofrpsblast+
, I get a lot of errors due to incorrect parsing of the arguments.