Question

BLAST Database creation error

1

Entering edit mode

9.9 years ago

vigneshprbh37 ▴ 30

Hey so for the past two days I have been trying to install and execute a stand alone blast named ncbi-blast-2.2.30+ on a centos os system. I managed to download a nr ref sequence from ncbi ftp using the command wget ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz and extracted the file using the command tar -xvpf nr.gz. I got an nr file of 33 gb size but when I try to format the file using the command

./makeblastdb -in /home/Desktop/ncbi-blast-2.2.30+/db/nr -dbtype 'nucl' -input_type 'fasta' -out /home/Desktop/ncbi-blast-2.2.30+/output

I get the error showing

BLAST Database creation error: FASTA-Reader: No residues given

Can anyone give any suggestions on the nature of the problem and how I can solve it?

alignment software-error blast • 13k views

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by vigneshprbh37 ▴ 30

0

Entering edit mode

You need a z in your tar command for gzip files. Use -xvzf. The file is probably not unzipped correctly. but also, the nr database is protein ("Non-Redundant peptides") so you will be creating a protein database. Think about whether that is what you want and why. I believe that if you ran the makeblastdb correctly it would tell you that you are mixing up protein and nucleotides. If you want the whole nucleotide database it is called 'nt ("Non-Translated nucleotides). I can't remember if they are the official acronyms or whether it's just what my brain uses to keep them the correct way around.

ADD REPLY • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by Daniel ★ 4.0k

Ram · Answer 1 · 2015-03-16

6

Entering edit mode

9.8 years ago

Galo ▴ 70

Ok I think I found the problem. When making a blast db or using any masker algorithm like DUST Windowmasker etc, it raises an error if the program finds an empty record. This means a record in the form:

>gi|xxxx|

>gi|yyyyy|
AGACCGATGACT

I'm sure there are many ways of remove empty records from a fasta file. A simple and fast way is using awk. You could simply copy and paste this command in the terminal (obviously substituting the name of your file):

awk -v RS=">" -v FS="\n" -v ORS="" ' { if ($2) print ">"$0 } ' your_fasta_file.fna > output.fna

If you want the explanation of the code follow this thread:

Removing All Empty Fasta Sequences From A File (Was: Editing The Headers Of The Fasta Format Sequence)

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 9.8 years ago by Galo ▴ 70

0

Entering edit mode

Worked for me, thanks!!

ADD REPLY • link 8.6 years ago by biotech ▴ 570

0

Entering edit mode

This occurs not only with empty records but also with completely masked ones BTW.

ADD REPLY • link 6.3 years ago by gtrwst9 • 0

Ram · Answer 2 · 2015-01-16

2

Entering edit mode

9.9 years ago

5heikki 11k

nr is a protein db, so -dbtype would be prot (there's no need for the apostrophes)
Is the filename really just "nr" after gunzip?
Why are you downloading the huge fasta file instead of the prebuilt db?

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by 5heikki 11k

0

Entering edit mode

2. yes it is a 33.6 gb file.

3. What's the difference with a prebuilt db, I don't know.

ADD REPLY • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by vigneshprbh37 ▴ 30

Ram · Answer 3 · 2015-03-16

EDITED: Now I realize it wasn't the same error so it maybe won't work for you. Tell us if this works for you or the way you solved this problem =)

Hi, I got the same error either making a blastDB from fasta files or using a masking algorithm like DUST.

For me the problem was that some of my sequences had a blank line between the last line of nucleotides from one sequence and the header of the next one like:

>gi|xxxx|
ATGACCGT...
[[:BLANK:]]
>gi|yyyy|
ACGATCGG...

An easy way of remove that blank lines in UNIX is with grep:

grep -v '^$' fasta_with_blanks.fna > fasta_without_blanks.fna

Saludos!