Question

What Is Nucmer'S "Percent Identity" Formally?

0

Entering edit mode

13.6 years ago

Nikolay Vyahhi ★ 1.3k

The output of nucmer --coords command is a .coords-file with [% IDY] column. This column contains floats up to 100% like 100.00, 99.05, 90.49 etc.

The manual doesn't say what exactly means "percent identity", but uses it heavily. Is it edit-distance based similarity or something more complicated?

Thank you.

distance similarity • 4.0k views

ADD COMMENT • link updated 13.6 years ago by Neilfws 49k • written 13.6 years ago by Nikolay Vyahhi ★ 1.3k

score 1 · Answer 1 · 2011-08-18

1

Entering edit mode

13.6 years ago

Neilfws 49k

You might be over-thinking this problem?

So far as I'm aware, if an alignment between 2 sequences is, for example, 200 bases long and the sequences have the same base at 70 positions, that's 35% identity. Nothing more complicated than that.

EDIT: more information

You need to look at the MUMmer source code (the src/ directory in the downloaded distribution).

You'll see files named sw_align.cc, sw_align.hh and sw_alignscore.hh in src/tigr. One would guess that the "SW" stands for "Smith-Waterman", a commonly-used alignment algorithm.

You can also use grep to search the source files for terms such as "percent" or "identity". For example, percent identity is defined in src/tigr/delta.hh as a float named idy.

And yes, I believe that indels are allowed, judging by the presence of functions to process them.

ADD COMMENT • link 13.6 years ago by Neilfws 49k

0

Entering edit mode

But this alignment uses edit distance, Hamming distance or some other scoring? E.g. do we allow indels or not?

ADD REPLY • link 13.6 years ago by Nikolay Vyahhi ★ 1.3k

0

Entering edit mode

But does this alignment use edit distance, Hamming distance or some other scoring? E.g. do we allow indels or not?

ADD REPLY • link 13.6 years ago by Nikolay Vyahhi ★ 1.3k

0

Entering edit mode

Edited my answer to provide more information.

ADD REPLY • link 13.6 years ago by Neilfws 49k

0

Entering edit mode

Is the % IDY computed with respect to the longer length or the smaller length?

ADD REPLY • link 11.2 years ago by gkamath • 0