Entering edit mode
5.0 years ago
I'm trying to build a custom database using makeblastdb command line tool from a large fasta file (about 3.0 GB in size). Here is my command....
makeblastdb -in /Users/myname/custom_hg19.fa -dbtype nucl -title hg19 -out /Users/myname/blast_dbs/hg19
The result is...
Building a new DB, current time: 11/14/2019 15:48:40
New DB name: /Users/myname/genome_references/blast_dbs/hg19
New DB title: hg19
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1073741824B
Adding sequences from FASTA; added 1 sequences in 0.00405002 seconds.
Only one sequence? Also, the resulting files are no where near the size they should be. Any ideas what I'm doing wrong?
Are you sure you have all the chromosomes there? try
grep -c ">" custom_hg19.fa
I ran this command and it outputs 46 as the count. If I rerun it as..
grep ">" custom_hg19.fa
I get...Weird. Is it blast+ v 2.10.0 by any chance?
Check your input fasta file and also try to add DB name. What is your fasta file size?
Fasta file size is ~3.0GB . By DB name did you mean the -out argument? I tried
-out /path/to/db/dbname
and also-out dbname
and neither worked.Hi dear,
You can do a try for a small file first. And see will you able to make blastdb for that? If yes then do a cross-check again with your fasta file also check the empty fasta header. One more check you can do like empty lines in your file.
Hoping these criteria check will help you.
if this is one chromosome, you should only have one sequence in the fasta.
you can count the number of sequences in your fasta via
grep '>' custom_hg19.fa | wc -l
Yup, did that (see previous comment)