Let's consider a hypothetical situation where I have a gene called G in humans that does function F and its homolog in mouse is gene H. On the whole, they are not conserved (represented by dashes) but there's a small portion that is conserved (represented by nucleotides) and we find that the conserved region is a binding site for a protein.
Gene G (human): ------------------------ACCCGATCGATCGCAGT----------------
Gene H (mouse): ------------------------ACCCGATCGATCGCAGT----------------
With that in mind, I have two questions:
- 1) Starting from G, how do you find that H is a homolog in mouse?
- For example, how is it done at the HomoloGene database or at Ensembl when they present you homologs of a gene? Is that done experimentally where they see the genes perform similar function in different species?
- 2) What could be the role of the nonconserved regions?
- Could the non-conserved sequences look utterly different yet share some 2D structure, or do you still need at least some sequence conservation? What kind of programs would be best suited to look at that?
- Could the non-conserved region be different because you need different sequences in order to implement the same function in different species?
Homologous structures are similar because of common descent not because of functional similarity (analogous), if they are similar at all. Sequence similarity is an indicator for common descent, not proof. Functional similarity is no proof for common descent.
Thanks, got confused with homologous and analogous.