cannot make BLAST database using makeblastdb command line tool
1
0
Entering edit mode
5.0 years ago
b10hazard ▴ 30

I'm trying to build a custom database using makeblastdb command line tool from a large fasta file (about 3.0 GB in size). Here is my command....

makeblastdb -in /Users/myname/custom_hg19.fa -dbtype nucl -title hg19 -out /Users/myname/blast_dbs/hg19

The result is...

Building a new DB, current time: 11/14/2019 15:48:40
New DB name:   /Users/myname/genome_references/blast_dbs/hg19
New DB title:  hg19
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1073741824B
Adding sequences from FASTA; added 1 sequences in 0.00405002 seconds.

Only one sequence? Also, the resulting files are no where near the size they should be. Any ideas what I'm doing wrong?

blastn makeblastdb • 4.4k views
ADD COMMENT
0
Entering edit mode

Are you sure you have all the chromosomes there? try grep -c ">" custom_hg19.fa

ADD REPLY
0
Entering edit mode

I ran this command and it outputs 46 as the count. If I rerun it as.. grep ">" custom_hg19.fa I get...

>chr1
>chr2
>chr3
>chr4
>chr5
>chr6
>chr7
>chr8
>chr9
>chr10
>chr11
>chr12
>chr13
>chr14
>chr15
>chr16
>chr17
>chr18
>chr19
>chr20
>chr21
>chr22
>chrX
>chrY
>chrM
>chr1_gl000191_random
>chr1_gl000192_random
>chr4_gl000193_random
>chr4_gl000194_random
>chr7_gl000195_random
>chr8_gl000196_random
>chr8_gl000197_random
>chr9_gl000198_random
>chr9_gl000199_random
>chr9_gl000200_random
>chr9_gl000201_random
>chr11_gl000202_random
>chr17_gl000203_random
>chr17_gl000204_random
>chr17_gl000205_random
>chr17_gl000206_random
>chr18_gl000207_random
>chr19_gl000208_random
>chr19_gl000209_random
>chr21_gl000210_random
ADD REPLY
0
Entering edit mode

Weird. Is it blast+ v 2.10.0 by any chance?

ADD REPLY
0
Entering edit mode
$ makeblastdb -version
makeblastdb: 2.2.18+
Package: blast 2.2.18, build Oct 14 2008 16:26:16
ADD REPLY
0
Entering edit mode

Dear,

Check your input fasta file and also try to add DB name. What is your fasta file size?

ADD REPLY
0
Entering edit mode

Fasta file size is ~3.0GB . By DB name did you mean the -out argument? I tried -out /path/to/db/dbname and also -out dbname and neither worked.

ADD REPLY
0
Entering edit mode

Hi dear,

You can do a try for a small file first. And see will you able to make blastdb for that? If yes then do a cross-check again with your fasta file also check the empty fasta header. One more check you can do like empty lines in your file.

Hoping these criteria check will help you.

ADD REPLY
0
Entering edit mode

if this is one chromosome, you should only have one sequence in the fasta.

you can count the number of sequences in your fasta via grep '>' custom_hg19.fa | wc -l

ADD REPLY
0
Entering edit mode

Yup, did that (see previous comment)

ADD REPLY
1
Entering edit mode
5.0 years ago
Asaf 10k

My only advice is upgrade to 2.9.0

ADD COMMENT
0
Entering edit mode

That worked. But the system version I'm trying to mimic is still 2.2.18, which is why I was using that version to begin with. Looks like I'll have to talk to my system admin... Thanks for the help!

ADD REPLY

Login before adding your answer.

Traffic: 2572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6