Downloading And Maintaining A Local, Blast-Able Nr Database
6
15
Entering edit mode
13.7 years ago
Anjan ▴ 840

I am planning to set up and maintain a local version of the NR and other NCBI databases, for running in-house BLAST searches. I would also like to my local version of the databases be in synch with NCBI through regular updates. NCBI suggests using the update_blastdb.pl (http://www.ncbi.nlm.nih.gov/blast/docs/update_blastdb.pl) to download the latest versions of all the pre-formatted databases. Does anyone have experiences to share on using this script? Are there alternative solutions? Will appreciate everyone's feedback. Thanks, Anjan

ncbi database blast installation • 34k views
ADD COMMENT
2
Entering edit mode

It's fine to constantly update the blast databases, but then you need to document which release you used when you did your analysis, right? With a new database, you might get slightly different hits...

ADD REPLY
1
Entering edit mode

ah yes, i believe running fastacmd with a -I option returns the version of the database. so fastacmd -d $HOME/blastdb/nt -I returns the version of the nt databases. This output can easily be tacked onto the end of a blast report to keep track of the database version.

ADD REPLY
0
Entering edit mode

+1 @flxlex: Agree. Was think the first thing any script should do is get and log the version database used before pulling data.

ADD REPLY
0
Entering edit mode

+1 @flxlex: Agree, the first thing any script should do is get and log the version database used before pulling data; or for that matter any data source.

ADD REPLY
0
Entering edit mode

I like the idea of maintaining a log of updates. The script does not create one. However it should not be difficult to start a log.

ADD REPLY
0
Entering edit mode

@Anjan: The idea is not to log the updates, store the version of BLAST the result were created with in the result set, or at least this is what I meant. If you have any questions, just comment again. Cheers!

ADD REPLY
0
Entering edit mode

[EDIT] @Anjan: The idea is not to log the updates when installed, but to store the version of BLAST used to produce the results with result data created. If you have any questions, just comment again. Cheers!

ADD REPLY
0
Entering edit mode

+1 @Anjan: Cool, thanks for posting the command-lines, and glad you were able to figure out what I was trying to say. Cheers!

ADD REPLY
0
Entering edit mode
ADD REPLY
9
Entering edit mode
13.7 years ago
Neilfws 49k

NCBI used to provide a method for incremental update of local databases. Its disadvantage was that the local and remote copies diverged over time. It looks like they've abandoned this approach with the new update script.

The update_blastdb.pl script looks fine. All it does is download the pre-formatted BLAST databases, if the local copies are either absent or older than the remote copies. I would just give it a try; if it's not to your liking, it's easy to implement something similar using any scripting language.

You should also decide how often you want to check for updates: daily, weekly, monthly? - and set up a cron job to automate the process. Here's one tutorial, or else just search the web for "cron tutorial".

ADD COMMENT
1
Entering edit mode

It will download the pre-formatted database files to whichever directory you specify. That should be all the "installation" required. When running BLAST, you either specify the path to the database files or define it in a configuration file.

ADD REPLY
0
Entering edit mode

+1 @neilfws: Much more relevant answer, one question though -- does the update_blastdb.pl file install the updates, or just download them?

ADD REPLY
0
Entering edit mode

+1 @neilfws: Thanks for the clarification.

ADD REPLY
0
Entering edit mode

no installation required. however you have to untar+unzip the files and get rid of the zip files. none of this is done by the script. again not a difficult task to code.

ADD REPLY
0
Entering edit mode

I wonder what is the prefer way to deal with blast jobs running at the time of the scheduled update_blastdb.pl run?

ADD REPLY
0
Entering edit mode

I have added a loop in my perl script that checks the list of running jobs for any active blast runs. If any blast jobs are detected the script goes to sleep for 2 minutes, reawakes and resamples the jobs list. Here is the code snippet:

while(){ #Use top to get a snapshot of processes that are running.If a BLAST job is running, sleep for 120s, resample top.
        my $status= `top -b -n1`;
        if ($status=~ /blastall|blast/){
            sleep(120);
             next;
        }
        else{
     last;
        }

HTH

ADD REPLY
2
Entering edit mode
13.7 years ago
Blunders ★ 1.1k

Possible you've seen these pages, but since you didn't link to them I'm posting them:

As for the sync, I'd suggest finding a way to monitor this page, and get an email alert on updates (since I was unable to find an email alert for updates): ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/ChangeLog

Upon getting an email alert, I'd manually review the updates posted - then do a build if needed.

ADD COMMENT
0
Entering edit mode

The update_blastdb.pl script checks whether the remote files are newer than the local; I don't think email alert is necessary.

ADD REPLY
2
Entering edit mode
13.7 years ago
Jan Kosinski ★ 1.6k

You may also try this: http://dunbrack.fccc.edu/BioDownloader/BioDownloader.php

I have never tried it, as it runs only under Windows, but perhaps you can run it using Wine on linux.

I also used update_blastdb.pl with success as following (put as a shell script in crontab)

echo "downloading nr"
cd /home2/db/blast; nice -n +15 ./update_blastdb.pl --passive --timeout 300 --force --verbose nr &> nr.updatedb.log
echo 'untaring nr'
tar -xzvf nr.00.tar.gz &>nr.00.tar.log
tar -xzvf nr.01.tar.gz &>nr.01.tar.log
tar -xzvf nr.02.tar.gz &>nr.02.tar.log
tar -xzvf nr.03.tar.gz &>nr.03.tar.log

rm nr.00.tar.gz &>nr.00.rm.log
rm nr.01.tar.gz &>nr.01.rm.log
rm nr.02.tar.gz &>nr.02.rm.log
rm nr.03.tar.gz &>nr.03.rm.log
ADD COMMENT
0
Entering edit mode

Thank you Jan, this is the most complete solution I have come across. You even have a log!

ADD REPLY
0
Entering edit mode

No probs, but keep in mind that if new nr.03.tar.gz will appear, it will be downloaded, but not extracted. So perhaps it would be better to embed it into some 'for' shell loop. I was checking it manually (it does not happen so ofter), and adding new lines if necessary ;-)

You may try: for file in nr.??.tar.gz; do tar -zxvf $file &> $file.tar.log; rm $file &> $file.rm.log; done

(not tested, there maybe typos)

ADD REPLY
0
Entering edit mode

Sorry, I meant "new nr.04.tar.gz".

ADD REPLY
2
Entering edit mode
8.7 years ago

I had the same problem two days ago, and what I did is to

  1. First Install NCBI Blast on your OS.
  2. Second download this file to update your local database"update_blastdb.pl".
  3. Finally download the database using the following command line:

    $ perl update_blastdb.pl --passive nt

ADD COMMENT
0
Entering edit mode
10.8 years ago
Adrian Pelin ★ 2.6k

Okay, how can we know when was the pre-formatted database updated? For instance, is there a change log file where NCBI states when their version of nr/nt on their ftp website was last updated?

ADD COMMENT
0
Entering edit mode

AFAIK, NCBI does a weekly release of data every Monday. HTH.

ADD REPLY
0
Entering edit mode

Salut Adrian. I wrote a tool that will run at computer start up and check if the local databases are old. The v2 of this tool will be available by the end of tomorrow.

http://www.dnabaser.com/download/NCBI-BLAST-downloader/

ADD REPLY
0
Entering edit mode

This looks really comfortable:) unfortunately, I can only use it on my home PC which runs MS Win, I will try and leave feedback, can I contact you here http://www.dnabaser.com/download/nextgen-fastq-editor/contact.html ? As always I must recommend you release source:) or at least port it to java so that it's OS independent.

ADD REPLY
0
Entering edit mode

Hi Adrian. Yes, that's the good link for contacting me.

About the port: the program was written in Delphi. Some months ago I just upgraded my license to Delphi 21 which can build for Win, OS X, iOS and Android (and I think some other platforms too but not for Linux). I am still fiddling around to see how this works :) So, there will be a Mac port quite soon. The Linux support will come when Delphi will support it. But since bioinformaticians are exclusively on Linux and they DON'T need my tool, Linux is not a priority anyway.

ADD REPLY
0
Entering edit mode
8.4 years ago
conchoecia • 0

I made a script that checks if there is a blast job currently running, waits until it is done, deletes the old dbs, then downloads the new ones and moves them into the same directory name. The script outputs everything into a dated log for archival purposes. Anyone have any suggestions for improvements?

I made this into a cron job by typing

crontab -e

...and adding this line.

0 3 1 1,4,7,10 * * /<your directory to>/<the script and update_blastdb.pl>/update_blast.sh

This line above sets up a cron job that will run the script every January, April, July, and October 1st, at 3AM. So you get quarterly updates!

#!/bin/bash

# Name this file "update_blast.sh" and put it in the same directory as your 
# "update_blastdb.pl" file. The nr datablast will be saved to "nr/", and the taxdb
# will be saved to "taxdb/" in the same directory. Run this script via the
# terminal or via a cron job.

# cron format http://www.nncron.ru/help/EN/working/cron-format.htm
# http://askubuntu.com/questions/2368/how-do-i-set-up-a-cron-job

#change the directory of $PWD to directory of script
cd "$(dirname "$0")"

#condiditional shell scripts: http://askubuntu.com/questions/157779
#bash wait for process to start

#define a timestamp function
# Define a timestamp function
timestamplong() {
  date +"%Y%m%d_%H-%M-%S" 
}

timestampshort() {
  date +"%Y%m%d"
}

logfile="$(timestampshort)_blastupdate.log"

#wait until blast is done to start

echo "# $(timestamplong) Log file created. Attempting to update blast and taxdb." >> $logfile 2>&1
echo "# $(timestamplong) Waiting until blast processes are done before continuing." >> $logfile 2>&1

while ps aux | grep ' /bin/blast' | grep -v 'grep' > /dev/null
do
    sleep 1
done

echo "# $(timestamplong) Blast processes complete. Proceeding with download." >> $logfile 2>&1


echo "# $(timestamplong) Deleting current nr and taxdb databases." >> $logfile 2>&1
rm -rf nr/ 
rm -rf taxdb/ 

echo "# $(timestamplong) Starting updateblast script." >> $logfile 2>&1
perl update_blastdb.pl --verbose --decompress nr taxdb >> $logfile 2>&1

echo "# $(timestamplong) Moving the taxdb and nr databases to nr/ and taxdb/" >> $logfile 2>&1
mkdir taxdb
mv taxdb.* taxdb
mkdir nr
mv nr.* nr

echo "# $(timestamplong) Update complete! blastdb and taxdb are now the most recent versions." >> $logfile 2>&1
ADD COMMENT

Login before adding your answer.

Traffic: 2605 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6