Entering edit mode
9.8 years ago
n00514450
▴
20
From what I read, the sequences
AACTG
AACTGAC
have 100% identity, while the sequences
AACTG
AACTC
have 80% identity. What about
AATCG--C
AATCGAAC
I like to use (#matches)/(alignment length), where alignment length includes gaps, because it is symmetric; you get the same identity regardless of which is the query and which is the reference. I count an N as 0.25 matches.
In practice, identity is often not a good metric, because it either gives exaggeratedly low scores to sequences with long indels, or ignores indels completely, neither of which makes much sense.