I'm running blast using the Ubuntu app on a Windows 10 desktop with Windows Subsystem for Linux feature turned on.
I'm trying to do a blastp against the NCBI nr database. I kept getting errors with the update_blastdb.pl
script, so I just downloaded all nr files in a directory /mnt/f/blast
and unzipped them like below.
wget 'ftp://ftp.ncbi.nlm.nih.gov/blast/db/nr.*.tar.gz
wget 'ftp://ftp.ncbi.nlm.nih.gov/blast/db/nr.*.tar.gz.md5
tar -zxvof nr.*.tar.gz
This produced 485 files, including nr.pto
, nr.ptf
, nr.pos
, nr.pdb
, nr.pal
and 8 different formats (phd
, phi
, phr
, pin
, pog
, ppd
, ppi
, psq
) for all files that start with nr.##
.
Then I tried running blastp like below:
blastp -query /mnt/c/Users/BL/Desktop/Blast/query_test.fasta -db /mnt/f/blast -out test.txt
This returned:
BLAST Database error: No alias or index file found for protein database [/mnt/f/blast] in search path [/mnt/f/blast::]
I looked up other threads in this forum and checked that I have a nr.pal
file and that the -
in the blastp command are actually minus signs.
I also checked my blastp version (2.10.1+) and checked the location of blastp using which blastp
, which returned /home/BL/ncbi-blast-2.10.1+/bin/blastp
. What else should I look into to resolve this error?
Thank you! This was exactly the problem. (And yes your assumption is also correct,
/mnt/f
is an external drive. I had downloaded thenr
database in there exactly because it was too large to be installed in myC
drive. Would you say that blast users generally make space in theirC
drives or use some other ways to avoid using an external drive?)Generally speaking, external drives are slower, so it won't help your case with a big database. You already made a significant effort to download
nr
, but if bandwidth is not an issue, I suggest you try the UniRef90 database:https://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref90/uniref90.fasta.gz
It is a database clustered at 90% identity. It should make no significant difference in results whether you search against
nr
oruniref90
, but the latter is less than half in size. If you decide to do it, once you unpack the database you will have to create the indices manually usingmakeblastdb
.I would say that most BLAST users don't have C drives, because I assume that most run it under Linux. A database of that size, especially if used frequently, should ideally be on the fastest disk available.
Ok, I'll try out the UniRef90 database. Thank you for your suggestions!