Question

How much sequence similarity should be there for the classification as isoforms, redundancy, ortholog, paralog , divergence , convergence?

1

Entering edit mode

3.6 years ago

WUSCHEL ▴ 860

I am a biologist. Recently I attended a 3-day symposium related to plant signaling stuff. At the end of the symposium, I was lost, puzzled, and confused listening to many talks from biologists, bioinformaticians, and mathematicians — I felt these scientific terms have used interchangeably — isoforms, redundancy, ortholog, paralog, divergence, convergence besides the meanings I have had from theory classes.

My questions:

Are sequence similarity plays a role in defining Gene family, Gene isoforms, Protein isoforms, Gene redundancy, ortholog, paralog, divergence, convergence?
If yes, How much similarity should be there of two genes/proteins to say they are isoforms and etc?
What are the biological/chemical properties bioinformaticians use to identify/predict— two or more sequences are isoforms and redundancy?

Evolution Protein Gene Genetics • 1.5k views

ADD COMMENT • link updated 2.3 years ago by Ram 45k • written 3.6 years ago by WUSCHEL ▴ 860

score 3 · Accepted Answer · 2021-11-08

The question is more about biology than bioinformatics really but I will try to clarify it for you anyway. IMHO these terms should not be used interchangeably (but perhaps it is different in the plant-related research field, feel free to correct me). I will not define all of them here since a simple internet search will provide you with good enough definitions.

Protein isoforms: By definition, protein isoforms are protein expressed from the same gene. In consequence, the protein sequence are often very similar, but that similarity is not a requirement to call them isoforms. Indeed, protein isoforms can also be quite different, with entirely different domains depending on alternative splicing, transcription start site usage, pos-transcriptional modifications, etc...
Redundancy: A gene is redundant with another one when it plays a role in a given process, but that the role is not essential to the process. It is defined experimentally by deletion/knock down of the gene. Typically, a redundant gene becomes essential for the process after the deletion of another gene. As you can see, the concept of redundancy is tighly associated with the study of phenotypes after genetic perturbations, so I don't think that it can be predicted bioinformatically without such experimental data.

PS: There is already a biostars textbook.