When I follow the OrthoMCL User to do my work, I use orthomclAdjustFasta to produce a compliant fasta file, and each protein in the file have a definition line in the following format:
>xxx|yyyyyyyy
. But when I runblastall(blastall -i ALL_goodProteins.fasta -d BLL_goodProteins.fasta -p blastp -e 1e-10 -m 8 -o A-to-B.txt), there are some error reports like these: [blastall] ERROR: SeqPortNew: lcl|172_BLL_goodProteins.fasta stop(449) >= len(367) [blastall] ERROR: SeqPortNew: lcl|172_BLL_goodProteins.fasta start(450) >= len(367) [blastall] ERROR: SeqPortNew: lcl|172_BLL_goodProteins.fasta start(459) >= len(367) [blastall] ERROR: SeqPortNew: lcl|172_BLL_goodProteins.fasta start(531) >= len(367)
---I think maybe all sequences of "BLL|yyyyy" or "ALL|yyyyyyy" are saw as repeat ids.
- So, then I use uncompliant fasta file(each protein only has a definition line >yyyy) to do NCBI BLAST -m 8. While when I input my blast results to orthomclBlastParser, I only got a vacant file named similiarSequences.txt.
Anyone can help me? Thank you very much!
OrthoMCL really just needs those first three characters for it to distinguish between the two datasets when you do the all vs all blast. Whatever is after the 'XXX|' is the just the ID of the sequence in the data set which needs to be unique and without spacing for the blast to work. So if you just reformat your fasta files so there is no spacing in the ID field, it should work.
Yes,I think that might be the reason. But the "orthoMCL User" tells me "each protein in those files must have a definition line in the following format: >xxxx|yyyyyyyy ", or else I can not do next steps such as orthomclBlastParser
Thank you very much! Each of my original sequence ID contains a space and when I remove it, I can do blastall successfully!
But I have another problem. When I do orthomclBlastParser like this: orthomclBlastParser Hsa-Ath.txt Ath >>similarSequences.txt
-----"Hsa-Ath.txt" is the BlAST output in m8 format. -----"Ath" is the directuory of compliant fasta files as produced by orthomclAdjustFasta
But it tells me "couldn't find taxon for gene '2_Ath.fasta' at /opt/bin/orthomclBlastParser line 103, <F> line 1."??? Could you help me?Thank you!
I have a similar problem at Blast Gives Cryptic Errors but I don't see any spaces.
if u wanted to use orthomclAdjust fasta on this you would want to 3 for the location of the ID because that script interprets spaces and line brake characters in the header as field separation... unless you want to keep whatever word is in the place of ID then you would want to remove the space between ID and 02919