I am using blast 2.4.0+ blastp output format is -outfmt 6. Some of my results are:
PSTr|PSTG_00001T0 PSTr|PSTG_00001T0 100.000 138 0 0 1 138 1 138 3.41e-101 286
MLi|212373 gnl|MLi|212373 100.000 110 0 0 1 110 1 110 8.97e-78 226
PSTr|PSTG_00001T0 gnl|PST|PstP_06241T0 98.182 110 2 0 29 138 1 110 1.75e-77 226
PSTr|PSTG_00001T0 PSTr|PSTG_14461T0 72.993 137 36 1 1 137 1 136 4.63e-67 200
PSTr|PSTG_00001T0 gnl|PST|PstP_16337T0 72.593 135 36 1 3 137 1 134 2.45e-65 196
PSTr|PSTG_00001T0 gnl|PST|PstP_17038T0 67.669 133 41 2 6 137 46 177 3.64e-55 172
My first question is: why some of my subject IDs have gnl|
in second column, some do not have?
In fact, I am running OrthoMCL, if I use the blast results above for subsequent OrthoMCL, for example, I run
$ orthomclBlastParser my.blast myadjust.directory > similarSequences.txt
Then I got error:
couldn't find taxon for gene 'gnl|MLi|212373' at /path/to/orthomclBlastParser line 105, <F> line 1.
So my second question is: can I just delete string gnl|
in my blast results, then continue OrthoMCL ?
Thanks in advance.
I don't know the answer to your first question, but for the second, yes, I believe you can remove the "gnl|" (making a backup of the original file):
edit: for the first question, maybe it is related to how you created the blast databases? Did you concatenated and created the database all at once? What were the commands used?