Hi Everyone,
I am new to Biostars. I am having trouble finding a concrete answer to the post's question.
My understanding is that sequence similarity is the fraction of residues that are similar between two different protein sequences. Percent identity is the number of characters that match exactly between two different sequences.
I read that sequence similarity is strongly correlated to percent identity. I also read that it is a subset of percent identity. These two are contradicting.
Can someone help me distinguish between the two concepts? Thanks
Hi! Have you read this webpage? I think it is nicely explained :)
Thank you for the link.
Looking at the link and these sequences:
A: AAGGCTT
B: AAGGC
I understand this has 100% identity. How is this 60% similar?
Edit distance is minimal number of edit operations (inserts, deletes, and substitutions) in order to transform the one sequence into an exact copy of the other sequence being aligned
Similar = 1 - edit distance/ unaligned length of shorter sequence
Therefore, similar = 1 - (2/2) or 1. Not sure how the author got 60%. Either the author made a typo in the similar definition or the math is wrong.
Can someone explain? Thanks.