Question

What's the precise definition for sensitivity and specificity for alignment?

1

Entering edit mode

8.8 years ago

scchess ▴ 640

When we talk about the sensitivity and specificity for NGS read alignments, what do we really mean?

For example, in the BWA paper, it talks about sensitivity. How would we define the true-positives and false-negatives? My guess (relative to a known genome):

TP: Number of reads that is aligned exactly and correctly (no gap, no mismatch)

FN: Number of reads fails to map but should be mapped (it comes from the known genome)

Is my definition correct? Is this what we mean when we say alignment sensitivity? What about specificity? Can we define specificity for alignment (not mentioned in the BWA paper)?

In other words, my question is about what we really mean when we talk about sensitivity and specificity in alignments.

genome sequence bwa alignment • 5.2k views

ADD COMMENT • link updated 2.4 years ago by Ram 44k • written 8.8 years ago by scchess ▴ 640

0

Entering edit mode

Forget about the bwa paper. It is not quite right. Sorry for the confusion.

ADD REPLY • link 8.8 years ago by lh3 33k

Ram · Answer 1 · 2016-02-08

3

Entering edit mode

8.8 years ago

lelle ▴ 830

This is indeed a tricky question, because even the definition of "aligned exactly and correctly" is not that easy.

If we simulated reads, than we know where they are from and how they were created, but what if (by chance) after introducing the random errors, mathematically the read maps better somewhere else? What if there are multiple mathematical best hits? This paper has some further thoughts.

ADD COMMENT • link updated 4.9 years ago by Ram 44k • written 8.8 years ago by lelle ▴ 830

0

Entering edit mode

Thanks. But I'm still unsure how the author in the paper calculates the alignment sensitivity. There is no Methods section for a precise definition.

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 8.8 years ago by scchess ▴ 640

score 2 · Answer 2 · 2016-02-08

2

Entering edit mode

8.8 years ago

Devon Ryan 104k

The wikipedia article on this is surprisingly good. The biggest issue with your definition is that for TP alignments, they can contain gaps and mismatches, since they're often simulated to contain them. Sensitivity and specificity are calculated with in silico generated datasets, so errors/variants are added in to see how an aligners output is affected. Consequently, you tend to get a break down of the numbers by MAPQ (at which point sensitivity and specificity aren't terribly useful terms).

ADD COMMENT • link 8.8 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks. Do you have any reference as to how sensitivity and specificity are calculated for an in-silicio dataset? I'm looking for a precise calculation, for example, what does a TP mean? Thanks.

ADD REPLY • link 8.8 years ago by scchess ▴ 640

0

Entering edit mode

There's not much to calculate, you typically want to ensure that a read overlaps where its original sequence was drawn from. When you generate in silico data, you put the mapping coordinates in the read name. After mapping, you see if there's an overlap. Ideally the coordinates would be exact, but since you typically add variants and indels into reads it's not terribly useful to be so strict. Just have a look through the source code of wgsim, or sherman or the rabema tool that lelle posted (I hadn't heard of that one before, it looks interesting) or one of the hundred other simulators out there. Almost all of them come with a function to check whether an alignment is correct.

ADD REPLY • link 8.8 years ago by Devon Ryan 104k