Entering edit mode
5.8 years ago
ricardoguerreiro2121
▴
80
Hello,
A simple question. What is the sequence identity between 2 sequences when one is much larger than the other?
Example:
seq1: -------------------AGTGTGAAAAAGGT----------------
seq2: ATATATGCGCATGGTAATAAGTGTGAAAAAGGTTATATGCGCATAAGGT
The smaller sequence corresponds 100% to a subset of the bigger one. Do they have 100% identity? Or rather something like 30%, as seq1 corresponds to 30% of seq2?
The reason why I ask this is that I am filtering an alignment of two assemblies of the same genome (with nucmer/mumer) and I can filter out aligned contigs based on identity.
Thank you,
Ricardo
Would have say that, if you look at seq1 it has 100% identity on 100% of its length, if you look at seq2 it has 100% identity on 30% of its length, it's a point a view
I would say seq1 is 100% identical to seq2, while seq2 is only 30% identical to seq1 .
unfortunately heavily depending on how you look at this
This is a relevant blog post: https://lh3.github.io/2018/11/25/on-the-definition-of-sequence-identity
Great, that's it, thanks! It depends on what is the query and what is the reference. Thanks! (If you write it as an answer instead of a comment I'll accept it)
It also depends on whether you use global or local alignment.