What Is Nucmer'S "Percent Identity" Formally?
1
0
Entering edit mode
13.3 years ago
Nikolay Vyahhi ★ 1.3k

The output of nucmer --coords command is a .coords-file with [% IDY] column. This column contains floats up to 100% like 100.00, 99.05, 90.49 etc.

The manual doesn't say what exactly means "percent identity", but uses it heavily. Is it edit-distance based similarity or something more complicated?

Thank you.

distance similarity • 3.8k views
ADD COMMENT
1
Entering edit mode
13.3 years ago
Neilfws 49k

You might be over-thinking this problem?

So far as I'm aware, if an alignment between 2 sequences is, for example, 200 bases long and the sequences have the same base at 70 positions, that's 35% identity. Nothing more complicated than that.

EDIT: more information

You need to look at the MUMmer source code (the src/ directory in the downloaded distribution).

You'll see files named sw_align.cc, sw_align.hh and sw_alignscore.hh in src/tigr. One would guess that the "SW" stands for "Smith-Waterman", a commonly-used alignment algorithm.

You can also use grep to search the source files for terms such as "percent" or "identity". For example, percent identity is defined in src/tigr/delta.hh as a float named idy.

And yes, I believe that indels are allowed, judging by the presence of functions to process them.

ADD COMMENT
0
Entering edit mode

But this alignment uses edit distance, Hamming distance or some other scoring? E.g. do we allow indels or not?

ADD REPLY
0
Entering edit mode

But does this alignment use edit distance, Hamming distance or some other scoring? E.g. do we allow indels or not?

ADD REPLY
0
Entering edit mode

Edited my answer to provide more information.

ADD REPLY
0
Entering edit mode

Is the % IDY computed with respect to the longer length or the smaller length?

ADD REPLY

Login before adding your answer.

Traffic: 2061 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6