Blast Gives Cryptic Errors
2
0
Entering edit mode
12.0 years ago
hbw ▴ 90

I have a list of proteins in fasta format (say goodProteins.fasta). I first make a compatible database using NCBI's formatdb v 2.2.18:

formatdb -i goodProteins.fasta -p T -o T

This gives me a number of files (.psq, .pin, .phr, .psi, .psd).

Then, I run NCBI's blastall :

blastall -F 'm S' -v 100000 -b 100000 -z 67924 -e 1e-5  -p blastp -d goodProteins.fasta -m 8 -o blast.out -i goodProteins.fasta

This starts the blast but then proceeds to give me errors such as

[NULL_Caption] ERROR: SeqPortNew: lcl|5579_goodProteins.fasta start(384) >= len(363)
[NULL_Caption] ERROR: SeqPortNew: lcl|5579_goodProteins.fasta start(404) >= len(363)
[NULL_Caption] ERROR: SeqPortNew: lcl|5579_goodProteins.fasta start(433) >= len(363)
[NULL_Caption] ERROR: SeqPortNew: lcl|5579_goodProteins.fasta start(453) >= len(363)
[NULL_Caption] ERROR: SeqPortNew: lcl|5579_goodProteins.fasta start(472) >= len(363)
[NULL_Caption] ERROR: SeqPortNew: lcl|5579_goodProteins.fasta start(491) >= len(363)

I have uploaded all the data files at http://www.filedropper.com/datatar so anyone can reproduce the errors.

I found this A: Some Questions About Using Orthomcl To Find Orthologs Within Many Species to a related question, but I don't see any spaces in my id's.

blast • 4.8k views
ADD COMMENT
1
Entering edit mode

Don't think it's directly the problem here but (for clarity if nothing else) you shouldn't really name your BLAST database the same as your query. Also makeblastdb might be a better choice to build your db, I think formatdb is no longer supported.

ADD REPLY
0
Entering edit mode

Ben, I don't think your comment is relevant because formatdb is the correct command with blastall (this isn't BLAST+) and blastall will just look for those file extensions listed in the question.

hbw, the SeqProtNew errors are usually related to using multiple, or incorrectly formatted, databases in my experience. Try using "-A F" when creating the database, and reconsider whether you need to use the "-z" option with blastall.

ADD REPLY
0
Entering edit mode

Yep, it's still a depreciated program though and, like I said, different names are good for clarity if nothing else (e.g. goodProteinDB + the random blast extensions).

ADD REPLY
1
Entering edit mode

using makeblastdb, the errors apparently disappear. I guess I will simply upgrade to 2.2.27 (that is BLAST+, right?).

ADD REPLY
3
Entering edit mode
12.0 years ago
DG 7.3k

You may want to check out the answer at this link, I believe this is what I discovered when experiencing a similar problem a year or two ago:

http://www.bioinformatics.org/pipermail/bioclusters/2003-December/001357.html

the gist of it is to explicitly set the -A F, even though it should be optional and the default of -A is supposed to be F.

I would also second the suggestion of switching to makeblastdb assuming you are using blast+ now instead of an older version of blast and that you're call to blastall is a reference to the wrapper script.

If you aren't, you really should switch over to blast+. It runs significantly faster.

ADD COMMENT
0
Entering edit mode

I think we must have been typing at the same time. Hopefully, this will work for the OP.

ADD REPLY
0
Entering edit mode

doing

formatdb -i goodProteins.fasta -p T -o T -A F

gives [NULL_Caption] ERROR: Invalid argument: -A

ADD REPLY
0
Entering edit mode

Try it without the "-o T"

ADD REPLY
2
Entering edit mode
12.0 years ago
SES 8.6k

The problem is how your SeqIDs are being parsed, though using "-A F" does not appear to be a solution in this case. It seems that option does not exist with this version of blast. These commands worked for me with your data (I just followed my previous comment but elaborated here):

formatdb -i goodProteins_nobl.fasta -p T -o F -n goodProteins_db
blastall -v 100000 -b 100000 -e 1e-5  -p blastp -d goodProteins_db -m 8 -o goodProteins_allvsall_1e5.bln -i goodProteins_nobl.fasta

By the way, I removed the blank lines in the fasta that followed every record before issuing these commands (that is why I renamed your input above). I also named the output something informative :-).

EDIT: I used NCBI BLAST v2.2.18 for this test to be consistent with the question, and I agree with others that switching to blast+ is a good long term solution. Switching to blast+ isn't necessary for this problem but you're guaranteed to get more help with blast+ in the future.

ADD COMMENT

Login before adding your answer.

Traffic: 2431 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6