I am not sure if this is a problem or if in fact the process is correct. Any help is much appreciated.
I am trying to make blast databases from assembly fasta files, and have seeing the above error. It generated blast database files but how do I know they are correct?
I followed these steps:
1) Downloaded assembly fasta file archive
site
http://hgdownload.cse.ucsc.edu/goldenpath/hg19/bigZips
file
http://hgdownload.cse.ucsc.edu/goldenpath/hg19/bigZips/chromFa.tar.gz
2) Unpacked the file
tar zyvf chromFa.tar.gz
3) Ran makeblastdb
/home/sean/blast/ncbi-blast-2.2.29+/bin/makeblastdb -dbtype nucl -title chr1.fa.blast -in ../chr1.fa -parse_seqids
4) Received an error
Building a new DB, current time: 09/04/2014 13:18:53
New DB name: ../chr1.fa
New DB title: chr1.fa.blast
Sequence type: Nucleotide
Deleted existing BLAST database with identical name.
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Error: (1431.1) FASTA-Reader: Warning: FASTA-Reader: First data line in seq is about 100% ambiguous nucleotides (shouldn't be over 40%)
Adding sequences from FASTA; added 1 sequences in 20.1816 seconds.
5) Output files generated
-rw-rw-r-- 1 sean sean 62359693 Sep 4 13:19 chr1.fa.nsq
-rw-rw-r-- 1 sean sean 59 Sep 4 13:19 chr1.fa.nsi
-rw-rw-r-- 1 sean sean 18 Sep 4 13:19 chr1.fa.nsd
-rw-rw-r-- 1 sean sean 36 Sep 4 13:19 chr1.fa.nog
-rw-rw-r-- 1 sean sean 96 Sep 4 13:19 chr1.fa.nin
-rw-rw-r-- 1 sean sean 43 Sep 4 13:19 chr1.fa.nhr
6) The start of the assembly file does contain a lot of N's
➜ hg19 head chr1.fa
>chr1
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
Thanks, I thought so, just wanted to sanity check my process. I know the BLAST databases can be downloaded from the NIH, but I am just trying to own the process.
Hi Devon! I have the same issue has sfcaroll, except my sequences don't have a single "n" in them. Should I be concerned about this error?