When I do orthomclBlastParser like this:
orthomclBlastParser Hsa-Ath.txt ~/my_orthomcl_dir/compliantFasta >>similarSequences.txt
- "Hsa-Ath.txt" is the BlAST output in m8 format.
- "~/my_orthomcl_dir/compliantFasta/" is the directuory of compliant fasta files(Hsa.fasta and Ath.fasta) as produced by orthomclAdjustFasta.
But it tells me "couldn't find taxon for gene '2_Ath.fasta' at /opt/bin/orthomclBlastParser line 103, <f> line 1."???
So, what is the error? Could anyone help me? Thank you!
The fasta headers in ~/my_orthomcl_dir/compliantFasta should look something like "org1|unique_protein_id". Everything up the the pipe character ("|") should be the taxon id. Can you post the output from
head -n1 ~/my_orthomcl_dir/compliantFasta/*
All the files in ~/myorthomcldir/compliantFasta should look something like:
Everything up the the pipe character ("|") should be the taxon id. If your protein ID has pipe characters in it, you might run into trouble.
Can you post the output from head -n1 ~/myorthomcldir/compliantFasta/*
Continuing from the other question you posted, how did you do the all vs all blast? Did you removed the ID spacing from the two fasta files (Hsa and Ath) and then concatenated them together for the blast? Can you post a few lines of the blast result?