I am using orthomcl on 4-5 species (complete proteomes) to try to find groups of orthologous genes. Will the evolutionary distances among these species affect my results? If species A and species B are in the same genus and all other species are in separate phyla, will orthomcl be biased towards putting A and B exclusively in an orthologous group?
From my results, it seems like I am getting proportionally more exclusive A-B groups, which makes sense since they are closer together. But when I blast some of the genes in the A-B group, I am getting decent hits to the other species I used in orthomcl.
The algorithm doesn't seem to be described very well in their paper and the source code for the orthology finding is basically a set of messy SQL calls. There does seem to be some kind of a weighing procedure to normalize the blast scores. Does anyone have any thoughts or suggestion for alternative method/software?
I think orthomcl is essentially clustering based on similarities calculated from all-to-all blast results. Could you try some phylogeny-based methods? I image that would give you some A and B lineage-specific duplications.