BLAST Database error: BLASTDB alias file creation failed. Some referenced files may be missing.
1
0
Entering edit mode
18 months ago

I am trying to create a blast database containing all plant sequences in Refseq release. I downloaded all the fasta files from the ftp site.

After discovering that some fasta files were larger than 1000000000 bytes, I split the overly large files into smaller fasta files using the following command:

awk 'BEGIN {n=0;} /^>/ {if(n%500==0){file=sprintf("chunk%d.fa",n);} print >> file; n++; next;} { print >> file; }' < multi.fa

Next, I proceeded to create the database using the command:

for i in *.f*a; do makeblastdb -in $i  -dbtype nucl -taxid_map ../plant_refseq_genomic_taxidmap.txt -parse_seqids  -title plantdb; done

Starting from over 1000 fasta files, I ended up with 1000 databases, each represented by 9 files (.ndb, .nhr, .nin, .nog, .nos, .not, .nsq, .ntf, .nto), that I want to group into a single alias.

I saved the list of all databases in a txt file:

plant.10.1.genomic.fna.1.fa
plant.10.1.genomic.fna.2.fa
plant.10.1.genomic.fna.3.fa
plant.10.1.genomic.fna.4.fa
plant.10.1.genomic.fna.5.fa
plant.10.1.genomic.fna.6.fa
plant.10.1.genomic.fna.7.fa
plant.10.1.genomic.fna.8.fa
plant.10.1.genomic.fna.9.fa
plant.10.2.genomic.fna
plant.10.3.genomic.fna
plant.10.4.genomic.fna
..
..

And I launched the following command:

blastdb_aliastool -dblist_file listdb.txt -dbtype nucl -out plantdb-refseq-release -title "plantdb-refseq-release"

But I am getting the following error:

BLAST Database error: BLASTDB alias file creation failed. Some referenced files may be missing.

What could be the reason for this error and how can I resolve it?

Thank you for your help

blastdb blast • 1.9k views
ADD COMMENT
0
Entering edit mode

db_listfifile should include basenames of your databases. Are those names correct?

ADD REPLY
0
Entering edit mode

if the base names are the names of the files .ndb, .nhr, .nin, .nog, .nos, .not, .nsq, .ntf, .nto without the extention, yes

ADD REPLY
0
Entering edit mode
18 months ago

I also tried to create alias adding some db one at a time and I menage to create the alias with the first 1018 db over the 1031 tot. If I insert the only one of the db from the 1019th to the 1031th it gives me back the error:

BLAST Database error: BLASTDB alias file creation failed. Some referenced files may be missing.

So I thougth there was a problem in one of those db or all of them, but then I tried to make an alias with only the db from the 1018th to the 1031th and it worked.

Is it possible that there is a maximum of db to include in an alias or a maximum in bytes?

ADD COMMENT
0
Entering edit mode

Is it possible that there is a maximum of db to include in an alias or a maximum in bytes?

It is quite possible. What you are trying to do is likely an outlier case. You can email NCBI help desk and ask.

There is a new version of blast+ that came out this week (v. 2.14). It probably won't help but in case you are willing you could try to see if it works.

Simplest workaounrd may be to merge some of your initial fasta files so the total number of databases stay under the limit you seem to have discovered.

ADD REPLY
0
Entering edit mode

Unfortunately I think it is not possible for me to merge some of my fasta because I was forced to split them because they where bigger than 1000000000 bytes and makeblastdb does not support file bigger than that.

ADD REPLY
1
Entering edit mode

Since your databases are going to be smaller than nt or nr it should be possible to create databases from your set.

Looking at the options for makeblastdb this is what option says

--max_file_size            Maximum file size to use for BLAST database. 4GB is the maximum supported by the database structure.

Looks like up to 4 GB files will work. You will need a ton of RAM to create the databases (which you may have). More info here: https://www.ncbi.nlm.nih.gov/books/NBK279684/table/appendices.T.makeblastdb_application_opt/?report=objectonly

ADD REPLY
0
Entering edit mode

what I found is:

"The nr and nt databases are huge (several GB's last time I checked) so when we run blast locally, on these databases, we may simply input -db nr or -db nt as appropriate but nr and nt are really aliases for multiple database files."

link

ADD REPLY
0
Entering edit mode

Correct but there are < 100 DBLIST files in nt and nr aliases. So NCBI is either using a technique that is not documented to create those or you should be able to do this with less than 1000 databases.

ADD REPLY

Login before adding your answer.

Traffic: 2065 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6