I am running makeblastdb from blast+ version 2.2.26 on uniprot_trmbl.fasta and get the following error
BLAST Database creation error: Error: Duplicate seq ids are found: GNL|BLORDID|2707210
Any ideas what's going on?
Thanks.
I am running makeblastdb from blast+ version 2.2.26 on uniprot_trmbl.fasta and get the following error
BLAST Database creation error: Error: Duplicate seq ids are found: GNL|BLORDID|2707210
Any ideas what's going on?
Thanks.
Yes, I have the same problem when I use BLAST (2.4.0+)
makeblastdb -in 160509_Chinese_Spring_v0.4_pseudomolecules.fasta -hash_index -title target -dbtype nucl
the error is:
BLAST Database creation error: Error: Duplicate seq_ids are found:
GNL|BL_ORD_ID:5
In fact, there is no duplicate ids in my fasta file, My fasta file is up to 15Gb
solution,add the -parse_seqids
well, the error message is saying that your fasta file have duplicate IDs. Blast tries to parse the fasta header to obtain an unique ID (check http://www.uniprot.org/help/fasta-headers). If you data base doen't include unique ID you have 2 options: 1) remove duplicate sequences, or 2) change the IDs for some unique key.
Did you find any solution to this?
I'm also getting this duplicate error when using -parse_seqids
and/or max_file_sz=4GB
.
I've downloaded the trEMBL data (https://www.uniprot.org/help/downloads).
Thanks
Hmm - 4gb is a historical maximal file size limit for many filesystems.
It's possible (but unlikely) that trembl messed up. Perhaps try something along these lines:
grep '>' db.fasta | sort | cut -f 1 -d ' ' | uniq -d
to see if any obviously duplicated identifiers are in there.
It may also be worth formatting with a different version of BLAST. They sometimes have bugs too... but more importantly the more recent versions have become better at explaining errors, including by highlighting specific problematic lines.
Finally, if you get an explicit error, try to disentangle where it is coming from by looking for that identifier in the input file.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Adding -maxfilesz '10GB' (the default is 1GB) solved the problem.