I am trying to create a blast database containing all plant sequences in Refseq release. I downloaded all the fasta files from the ftp site.
After discovering that some fasta files were larger than 1000000000 bytes, I split the overly large files into smaller fasta files using the following command:
awk 'BEGIN {n=0;} /^>/ {if(n%500==0){file=sprintf("chunk%d.fa",n);} print >> file; n++; next;} { print >> file; }' < multi.fa
Next, I proceeded to create the database using the command:
for i in *.f*a; do makeblastdb -in $i -dbtype nucl -taxid_map ../plant_refseq_genomic_taxidmap.txt -parse_seqids -title plantdb; done
Starting from over 1000 fasta files, I ended up with 1000 databases, each represented by 9 files (.ndb, .nhr, .nin, .nog, .nos, .not, .nsq, .ntf, .nto), that I want to group into a single alias.
I saved the list of all databases in a txt file:
plant.10.1.genomic.fna.1.fa
plant.10.1.genomic.fna.2.fa
plant.10.1.genomic.fna.3.fa
plant.10.1.genomic.fna.4.fa
plant.10.1.genomic.fna.5.fa
plant.10.1.genomic.fna.6.fa
plant.10.1.genomic.fna.7.fa
plant.10.1.genomic.fna.8.fa
plant.10.1.genomic.fna.9.fa
plant.10.2.genomic.fna
plant.10.3.genomic.fna
plant.10.4.genomic.fna
..
..
And I launched the following command:
blastdb_aliastool -dblist_file listdb.txt -dbtype nucl -out plantdb-refseq-release -title "plantdb-refseq-release"
But I am getting the following error:
BLAST Database error: BLASTDB alias file creation failed. Some referenced files may be missing.
What could be the reason for this error and how can I resolve it?
Thank you for your help
db_listfifile
should includebasenames
of your databases. Are those names correct?if the base names are the names of the files .ndb, .nhr, .nin, .nog, .nos, .not, .nsq, .ntf, .nto without the extention, yes