I'm familiar with the BLAST family of software: I've used both the old interface (blastall, formatdb, et al) and the new interface (blastx, makeblastdb, et al). However, I've always used it with in-house databases. I've never tried downloading and using NCBI's non-redundant database...which is what I'm trying to do now.
Turns out someone in our lab recently downloaded the nr and nt databases using the update_blastdb.pl
script, so that saves me that trouble. However, I am having issues when I try to run BLAST against the database.
I created a Fasta file that has a single query sequence in it...maybe several hundred bp long. When I just do a simple command like one of the two below, it runs without any end in sight (consuming a lot of RAM too).
$ blastall -p blastx -i test.fasta -d /data/blast/db/nr -m 7
^C
$ blastall -p blastn -i test.fasta -d /data/blast/db/nt -m 7
^C
So I though 'ok, maybe I'm supposed to point it at the alias file', so I tried the following commands, ending immediately in an error.
$blastall -p blastx -i test.fasta -d /data/blast/db/nr.pal -m 7
[blastall] FATAL ERROR: AT1G51370.2: Database /data/blast/db/nr.pal was not found or does not exist
$ blastall -p blastn -i test.fasta -d /data/blast/db/nt.pal -m 7
[blastall] FATAL ERROR: AT1G51370.2: Database /data/blast/db/nt.pal was not found or does not exist
I've run fastacmd
to make sure the databases are working correctly and I don't see any problems.
fastacmd -d /data/blast/db/nr -I
Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF
excluding environmental samples from WGS projects
10,688,764 sequences; 3,647,636,407 total letters
File names:
/data/blast/db/nr.00
Date: Mar 25, 2010 5:42 PM Version: 4 Longest sequence: 36,805 res
/data/blast/db/nr.01
Date: Mar 25, 2010 5:42 PM Version: 4 Longest sequence: 35,213 res
/data/blast/db/nr.02
Date: Mar 25, 2010 5:42 PM Version: 4 Longest sequence: 33,423 res
/data/blast/db/nr.03
Date: Mar 25, 2010 5:42 PM Version: 4 Longest sequence: 33,423 res
$ fastacmd -d /data/blast/db/nt -I
Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,
GSS,environmental samples or phase 0, 1 or 2 HTGS sequences)
11,257,610 sequences; 30,637,862,539 total letters
File names:
/data/blast/db/nt.00
Date: Mar 25, 2010 2:13 PM Version: 4 Longest sequence: 7,215,267 bp
/data/blast/db/nt.01
Date: Mar 25, 2010 2:13 PM Version: 4 Longest sequence: 9,105,828 bp
/data/blast/db/nt.02
Date: Mar 25, 2010 2:13 PM Version: 4 Longest sequence: 7,074,893 bp
/data/blast/db/nt.03
Date: Mar 25, 2010 5:42 PM Version: 4 Longest sequence: 6,365,727 bp
/data/blast/db/nt.04
Date: Mar 25, 2010 5:42 PM Version: 4 Longest sequence: 27,905,053 bp
/data/blast/db/nt.05
Date: Mar 25, 2010 5:42 PM Version: 4 Longest sequence: 13,033,779 bp
/data/blast/db/nt.06
Date: Mar 25, 2010 2:13 PM Version: 4 Longest sequence: 8,545,929 bp
/data/blast/db/nt.07
Date: Mar 25, 2010 5:42 PM Version: 4 Longest sequence: 10,467,782 bp
/data/blast/db/nt.08
Date: Mar 25, 2010 5:42 PM Version: 4 Longest sequence: 10,341,314 bp
Any ideas what the issue might be?
Your first commands should be correct. How long have you let them run? Searching the nr/nt databases might take a long time, you should probably try a smaller database first as a proof of concept.
update_blastdb.pl
how did your co-worker get it working? We are having difficulties C: What Is The Best Way To Download Genbank Locally?