Hi community! I am not very experienced in bio-informatics and have this (hopefully for you simple) problem: I am working on a gene, that is expressed as two isoforms in a tissue-specific manner. Exemplarily, its like this:
Isoform 1: aaaaaaaaaaabbbbcccdddeeeeeeeeee
Isoform 2: _______aafffbbbbcccdddeeeeeeeeee
So, the centric / C-terminal part " bbbbcccdddeeeeeeeeee " is highly conserved within the class and, in comparison, very similar to other class-members. My concrete problem is: I would like to describe / quantify the differences that show up in the N-term when comparing the N-terminal sequences. I would like to write something like that: “... while the conserved C-terminal part shows a high similarity(??) / identity (??) / alignment score of xx(??).. , the N-term varies... and has a similarity of (??) / identity of (??) / an alignment score of xx(??).”. I need to show the differences as a number. So far, I used Eboss/needle to quantify the identity, similarity and the score of “aaaaaaaaaaa” and “aafff” – and would like to know, whether this a usefull way at all? I know, that such “numbers” are usually used / have been invented to investigate / describe the evolutionary progress of DNA / proteins primarily within different species, but do you think, that this may also be a way to describe variances within different isoforms? As the N-term of isoform2 is quite short (maybe: isoform1: 200aa, isoform2: 20aa), how to handle such things like "gap penalty"? In summary: do you think, that this is usefull at all, or would it be better to be descriptive? Thanks for any kind of input!