We noticed that a key gene to our study was not placed considered in any orthology group for our OMA standalone run owing to the fact that it is a transcription factor (Brachyury) and has a shorter conserved domain. By testing on a small subset it seems like lowering the LengthTol parameter to 0.4 allows it to be included.
I am concerned, though, that this will introduce dubious/spurious cluster links in the final orthology groups. Would that be the case? Is there a pruning step to prevent a low intra-group homology?
Thanks!
Dear Adrian,
Thanks much for your comment! That's quite helpful.
In fact, we are only interested in HOGs as we are using OMA as a quick way to determine gene gains, losses and duplications of a subset of genes of interest in all 5 taxa we are investigating.
Note that Brachyury is a gene (also known as "T"), not a genome. In this case, all examples of Brachyury that are incorrectly grouped are complete and verified. We found that lowering the LengthTol parameter to 0.4 groups the gene correctly in some taxa, but owing to a very short match between two taxa of our study, it will only be grouped together when the LengthTol is as low as 0.35. The question becomes: how crucial is LengthTol to the correct formation of HOGs? Did you ever find a lot of cases where short spurious perfect matches misplaced a gene as a in-paralog or ortholog?
Dear Robert,
sorry about the misunderstanding. Another parameter that you might consider lowering a bit is the "ReachabilityCutoff" or the "MinEdgeCompletenessFraction" parameter (depending whether you use top-down or bottom-up HOG inference). This might be especially helpful in case that the Brachyury genes do have pairwise ortholog relations with the higher LengthTol.
To answer the question about having observed false positives in HOGs when lowering the MinLength parameter, this I have certainly observed for cases with multidomain genes, that share only a partial history, e.g. through gene fission events. But otherwise, this should not be a major problem. So my advice is to go with the lower cutoff.