download BLAST NCBI nt database
1
0
Entering edit mode
11 months ago
Sony ▴ 20

Hi everyone,

I am trying to download BLAST NCBI nt database by using this

./update_blastdb.pl nt --decompress --num_threads 16

But it has only done for some of the files, and I also got this error:

The contents of https://ftp.ncbi.nlm.nih.gov/blast/db/nt.017.tar.gz are up to date in your system.
The contents of https://ftp.ncbi.nlm.nih.gov/blast/db/nt.018.tar.gz are up to date in your system.
The contents of https://ftp.ncbi.nlm.nih.gov/blast/db/nt.019.tar.gz are up to date in your system.
The contents of https://ftp.ncbi.nlm.nih.gov/blast/db/nt.020.tar.gz are up to date in your system.
The contents of https://ftp.ncbi.nlm.nih.gov/blast/db/nt.021.tar.gz are up to date in your system.
The contents of https://ftp.ncbi.nlm.nih.gov/blast/db/nt.022.tar.gz are up to date in your system.
The contents of https://ftp.ncbi.nlm.nih.gov/blast/db/nt.023.tar.gz are up to date in your system.
The contents of https://ftp.ncbi.nlm.nih.gov/blast/db/nt.024.tar.gz are up to date in your system.
The contents of https://ftp.ncbi.nlm.nih.gov/blast/db/nt.025.tar.gz are up to date in your system.
The contents of https://ftp.ncbi.nlm.nih.gov/blast/db/nt.026.tar.gz are up to date in your system.
The contents of https://ftp.ncbi.nlm.nih.gov/blast/db/nt.027.tar.gz are up to date in your system.
The contents of https://ftp.ncbi.nlm.nih.gov/blast/db/nt.028.tar.gz are up to date in your system.
The contents of https://ftp.ncbi.nlm.nih.gov/blast/db/nt.029.tar.gz are up to date in your system.
The contents of https://ftp.ncbi.nlm.nih.gov/blast/db/nt.030.tar.gz are up to date in your system.
The contents of https://ftp.ncbi.nlm.nih.gov/blast/db/nt.031.tar.gz are up to date in your system.
The contents of https://ftp.ncbi.nlm.nih.gov/blast/db/nt.032.tar.gz are up to date in your system.
The contents of https://ftp.ncbi.nlm.nih.gov/blast/db/nt.033.tar.gz are up to date in your system.
The contents of https://ftp.ncbi.nlm.nih.gov/blast/db/nt.034.tar.gz are up to date in your system.
The contents of https://ftp.ncbi.nlm.nih.gov/blast/db/nt.035.tar.gz are up to date in your system.
The contents of https://ftp.ncbi.nlm.nih.gov/blast/db/nt.036.tar.gz are up to date in your system.
The contents of https://ftp.ncbi.nlm.nih.gov/blast/db/nt.037.tar.gz are up to date in your system.
The contents of https://ftp.ncbi.nlm.nih.gov/blast/db/nt.038.tar.gz are up to date in your system.
The contents of https://ftp.ncbi.nlm.nih.gov/blast/db/nt.039.tar.gz are up to date in your system.
The contents of https://ftp.ncbi.nlm.nih.gov/blast/db/nt.040.tar.gz are up to date in your system.
Downloading https://ftp.ncbi.nlm.nih.gov/blast/db/nt.041.tar.gz...curl: (56) OpenSSL SSL_read: error:0A000126:SSL routines::unexpected eof while reading, errno 0
corrupt download, trying again.
Downloading https://ftp.ncbi.nlm.nih.gov/blast/db/nt.041.tar.gz...curl: (23) Failed writing body
curl: (23) Failed writing body
Use of uninitialized value $_ in split at ./update_blastdb.pl line 718.
Use of uninitialized value $rmt_digest in string ne at ./update_blastdb.pl line 619.
corrupt download, trying again.
Downloading https://ftp.ncbi.nlm.nih.gov/blast/db/nt.041.tar.gz...curl: (23) Failed writing body
curl: (23) Failed writing body
Use of uninitialized value $_ in split at ./update_blastdb.pl line 718.
Use of uninitialized value $rmt_digest in string ne at ./update_blastdb.pl line 619.
too many failures, aborting download!

I tried again many times to download this nt database, but it also got the same error. Are there any solutions to solve this problem ? Or any ways to download ncbi blast nt database instead of download one by one nt* file by wget command ?

image

Thank you.

blast nt • 2.2k views
ADD COMMENT
1
Entering edit mode

While not the exact errors you have noted here we have also encountered errors downloading pre-formatted databases from NCBI for last 2-3 months. The process seems to fail on random files and may take several iterations before it completes.

I am going to tag PeterC_NCBI who works at NCBI to see if he can shed some light.

ADD REPLY
2
Entering edit mode

Thanks! Looking at this now.

ADD REPLY
1
Entering edit mode

Most of the times failures are because of missing .md5 files for a specific piece (random ones) if that helps.

ADD REPLY
0
Entering edit mode

Good to know. Thanks.

ADD REPLY
1
Entering edit mode

I had the BLAST development team look at this. They say the errors here from curl reporting network errors and errors writing to local disk: CURLE_RECV_ERROR (56): Failure with receiving network data. CURLE_WRITE_ERROR (23): An error occurred when writing received data to a local file, or an error was returned to libcurl from a write callback.

As far as the missing ,md5 files. They think this is possible if the download was taking place at the same time the databases were being updated.

They have created tickets to improve error handling on failure to download files and are investigating whether whether the updates to the FTP site impact downloading BLASTDBs from NCBI.

ADD REPLY
0
Entering edit mode

I tried again many times to download this nt database, but it also got the same error. Are there any solutions to solve this problem ? Or any ways to download ncbi blast nt database instead of download one by one nt* file by wget command ? Thank you.

ADD REPLY
1
Entering edit mode

You have several hundred GB's of space available correct? Last I had looked nt db was over 350GB.

ADD REPLY
0
Entering edit mode

yes, I have 4.1TB space for downloadenter image description here

ADD REPLY
0
Entering edit mode

You could try setting up your own download solution directly wget/curling nt files from: https://ftp.ncbi.nih.gov/blast/db/

Otherwise keep trying the update script until you succeed.

ADD REPLY
0
Entering edit mode

Hi all, I am constructing the pangenome on rice. And after I assembled my novo contigs, I want to detect the contamination (non-green plant species and fungi). On the article reference, they used Blast to search the assembled contigs with the Blast NCBI nt database to detect the contamination from non-green plant species and fungi. Here is the list of nt files of blast nt database. I know that (nt_euk) is associated with eukaryotic sequences ; prokaryotic (nt_prok); viral (nt_viruses); and other sequences (nt_others). But on the list file of blast nt database, I saw files: nt.000.tar.gz to nt.000.tar.gz .... nt.124.tar.gz , I just want to ask what is it database ?

in case I focus on non-green plant species and fungi contamination, which nt files should I download (nt_euk and ??? ) ? Thank you for all.

ADD REPLY
1
Entering edit mode

this is a different question altogether and should be posted as such.

Your new post should have the title

How to construct a blast database for non-green plant species

Then describe the above what you have tried.

ADD REPLY
1
Entering edit mode
11 months ago

Try downloading the file manually and see what happens. See the files here:

It is as simple as

curl -O https://ftp.ncbi.nlm.nih.gov/blast/db/nr.41.tar.gz

then unzip it.

You can easily automate the above command for all other links.

I had some problems with the perl wrapper in the past, really all it does is that it avoids download files where the hash has not changed.

ADD COMMENT

Login before adding your answer.

Traffic: 2088 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6