I ran into the following error when trying to build a database using makeblastdb (NCBI BLAST 2.2.23+).
> makeblastdb -in uniprot90.faa -dbtype prot -parse_seqids
Building a new DB, current time: 08/30/2010 12:00:11
New DB name: uniprot90.faa
New DB title: uniprot90.faa
Sequence type: Protein
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1073741824B
Error: invalid string size parameter in function: basic_string::__getRep(size_t,size_t)
size: -2 is greater than maximum size: -51
I ran the corresponding command with the older version (formatdb) and got no errors, so I'm assuming it's not an issue with the sequence data. Does anyone know what might have caused this problem?
Just saw the latest update is 2.2.24 now ftp://ftp.ncbi.nih.gov/blast/executables/blast+/2.2.24/, can you try with that? Otherwise a reproducible example is required, could you try with the first few entries of your input, or just put the fasta file online?
Some more wild guesses: try without -parse_seqids, it could be some eg non utf-8 chars in the fasta headers. Otherwise the ncbi will need a reproducible example anyway.
I'm guessing you're on a 32 bit system and your file size is larger than the maximum indicated in your paste (so you get overflow with size_t). Try using a smaller db, or a 64bit system.
I thought that might be the problem as well. I am on a 64 bit system and I tried both making the DB smaller and allowing a bigger DB, neither of which helped.
Just saw the latest update is 2.2.24 now ftp://ftp.ncbi.nih.gov/blast/executables/blast+/2.2.24/, can you try with that? Otherwise a reproducible example is required, could you try with the first few entries of your input, or just put the fasta file online?
Some more wild guesses: try without -parse_seqids, it could be some eg non utf-8 chars in the fasta headers. Otherwise the ncbi will need a reproducible example anyway.
This must be a bug with 2.2.23, because there were no issues with 2.2.24. Thanks!