Hello everyone,
I want to perform a PAML analysis using codeml. For it I have an alignment containing protein-coding DNA sequences from a broad array of animals.
However, in some cases, gene orthology in my species is not 1-to-1 (i.e. Some animals have more than one ortholog, more than one sequence). The problem is that PAML accepts only one sequence per species, leaving me with the decision to choose among these multiple sequences.
I looked through PAML manual, and there is no orientation about this issue (which I believe, must be kind of common). I made some trees and distance matrices, but in some cases genes with multiple orthology are just "equally" far away from their respective orthologs.
Can you suggest any "best practice" to deal with this issue?
Thanks a lot!
In general people focus on one-to-one orthologs (and skips one-to-many or many-to-many) in their analyses precisely to avoid this problem.
I read in several papers that they filter and keep genes only with one-to-one orthology, but I didn't think that this problem was so "unsurmountable".