I am planning to set up and maintain a local version of the NR and other NCBI databases, for running in-house BLAST searches.
I would also like to my local version of the databases be in synch with NCBI through regular updates.
NCBI suggests using the update_blastdb.pl (http://www.ncbi.nlm.nih.gov/blast/docs/update_blastdb.pl) to download the latest versions of all the pre-formatted databases.
Does anyone have experiences to share on using this script? Are there alternative solutions?
Will appreciate everyone's feedback.
Thanks,
Anjan
It's fine to constantly update the blast databases, but then you need to document which release you used when you did your analysis, right? With a new database, you might get slightly different hits...
ah yes, i believe running fastacmd with a -I option returns the version of the database.
so fastacmd -d $HOME/blastdb/nt -I returns the version of the nt databases. This output can easily be tacked onto the end of a blast report to keep track of the database version.
+1 @flxlex: Agree, the first thing any script should do is get and log the version database used before pulling data; or for that matter any data source.
@Anjan: The idea is not to log the updates, store the version of BLAST the result were created with in the result set, or at least this is what I meant. If you have any questions, just comment again. Cheers!
[EDIT] @Anjan: The idea is not to log the updates when installed, but to store the version of BLAST used to produce the results with result data created. If you have any questions, just comment again. Cheers!
NCBI used to provide a method for incremental update of local databases. Its disadvantage was that the local and remote copies diverged over time. It looks like they've abandoned this approach with the new update script.
The update_blastdb.pl script looks fine. All it does is download the pre-formatted BLAST databases, if the local copies are either absent or older than the remote copies. I would just give it a try; if it's not to your liking, it's easy to implement something similar using any scripting language.
You should also decide how often you want to check for updates: daily, weekly, monthly? - and set up a cron job to automate the process. Here's one tutorial, or else just search the web for "cron tutorial".
It will download the pre-formatted database files to whichever directory you specify. That should be all the "installation" required. When running BLAST, you either specify the path to the database files or define it in a configuration file.
no installation required. however you have to untar+unzip the files and get rid of the zip files. none of this is done by the script. again not a difficult task to code.
I have added a loop in my perl script that checks the list of running jobs for any active blast runs. If any blast jobs are detected the script goes to sleep for 2 minutes, reawakes and resamples the jobs list.
Here is the code snippet:
while(){#Use top to get a snapshot of processes that are running.If a BLAST job is running, sleep for 120s, resample top.
my $status=`top -b -n1`;if($status=~ /blastall|blast/){
sleep(120);
next;}
else{
last;}
HTH
ADD REPLY
• link
updated 5.6 years ago by
Ram
45k
•
written 12.4 years ago by
Anjan
▴
840
No probs, but keep in mind that if new nr.03.tar.gz will appear, it will be downloaded, but not extracted. So perhaps it would be better to embed it into some 'for' shell loop. I was checking it manually (it does not happen so ofter), and adding new lines if necessary ;-)
You may try:
for file in nr.??.tar.gz; do tar -zxvf $file &> $file.tar.log; rm $file &> $file.rm.log; done
Okay, how can we know when was the pre-formatted database updated? For instance, is there a change log file where NCBI states when their version of nr/nt on their ftp website was last updated?
Salut Adrian. I wrote a tool that will run at computer start up and check if the local databases are old. The v2 of this tool will be available by the end of tomorrow.
This looks really comfortable:) unfortunately, I can only use it on my home PC which runs MS Win, I will try and leave feedback, can I contact you here http://www.dnabaser.com/download/nextgen-fastq-editor/contact.html ? As always I must recommend you release source:) or at least port it to java so that it's OS independent.
Hi Adrian. Yes, that's the good link for contacting me.
About the port: the program was written in Delphi. Some months ago I just upgraded my license to Delphi 21 which can build for Win, OS X, iOS and Android (and I think some other platforms too but not for Linux). I am still fiddling around to see how this works :) So, there will be a Mac port quite soon. The Linux support will come when Delphi will support it. But since bioinformaticians are exclusively on Linux and they DON'T need my tool, Linux is not a priority anyway.
I made a script that checks if there is a blast job currently running, waits until it is done, deletes the old dbs, then downloads the new ones and moves them into the same directory name. The script outputs everything into a dated log for archival purposes. Anyone have any suggestions for improvements?
This line above sets up a cron job that will run the script every January, April, July, and October 1st, at 3AM. So you get quarterly updates!
#!/bin/bash# Name this file "update_blast.sh" and put it in the same directory as your # "update_blastdb.pl" file. The nr datablast will be saved to "nr/", and the taxdb# will be saved to "taxdb/" in the same directory. Run this script via the# terminal or via a cron job.# cron format http://www.nncron.ru/help/EN/working/cron-format.htm# http://askubuntu.com/questions/2368/how-do-i-set-up-a-cron-job#change the directory of $PWD to directory of scriptcd"$(dirname"$0")"#condiditional shell scripts: http://askubuntu.com/questions/157779#bash wait for process to start#define a timestamp function# Define a timestamp function
timestamplong(){date +"%Y%m%d_%H-%M-%S"}
timestampshort(){date +"%Y%m%d"}
logfile="$(timestampshort)_blastupdate.log"#wait until blast is done to startecho"# $(timestamplong) Log file created. Attempting to update blast and taxdb.">>$logfile 2>&1
echo"# $(timestamplong) Waiting until blast processes are done before continuing.">>$logfile 2>&1
whileps aux |grep' /bin/blast'|grep -v 'grep'> /dev/null
do
sleep 1
doneecho"# $(timestamplong) Blast processes complete. Proceeding with download.">>$logfile 2>&1
echo"# $(timestamplong) Deleting current nr and taxdb databases.">>$logfile 2>&1
rm -rf nr/
rm -rf taxdb/
echo"# $(timestamplong) Starting updateblast script.">>$logfile 2>&1
perl update_blastdb.pl --verbose --decompress nr taxdb >>$logfile 2>&1
echo"# $(timestamplong) Moving the taxdb and nr databases to nr/ and taxdb/">>$logfile 2>&1
mkdir taxdb
mv taxdb.* taxdb
mkdir nr
mv nr.* nr
echo"# $(timestamplong) Update complete! blastdb and taxdb are now the most recent versions.">>$logfile 2>&1
It's fine to constantly update the blast databases, but then you need to document which release you used when you did your analysis, right? With a new database, you might get slightly different hits...
ah yes, i believe running fastacmd with a -I option returns the version of the database. so fastacmd -d $HOME/blastdb/nt -I returns the version of the nt databases. This output can easily be tacked onto the end of a blast report to keep track of the database version.
+1 @flxlex: Agree. Was think the first thing any script should do is get and log the version database used before pulling data.
+1 @flxlex: Agree, the first thing any script should do is get and log the version database used before pulling data; or for that matter any data source.
I like the idea of maintaining a log of updates. The script does not create one. However it should not be difficult to start a log.
@Anjan: The idea is not to log the updates, store the version of BLAST the result were created with in the result set, or at least this is what I meant. If you have any questions, just comment again. Cheers!
[EDIT] @Anjan: The idea is not to log the updates when installed, but to store the version of BLAST used to produce the results with result data created. If you have any questions, just comment again. Cheers!
+1 @Anjan: Cool, thanks for posting the command-lines, and glad you were able to figure out what I was trying to say. Cheers!
You may want to try this: http://www.dnabaser.com/download/NCBI-BLAST-downloader/