Truth.TP

Question

Help understanding PrecisionFDA and GA4GH "TRUTH.TP" and "QUERY.TP" metrics

0

Entering edit mode

7.0 years ago

bdolin ▴ 100

Greetings,

GA4GH benchmarking performance metrics (https://github.com/ga4gh/benchmarking-tools/blob/update-perf-metrics-definitions/doc/standards/GA4GHBenchmarkingPerformanceMetricsDefinitions.md) and PrecisionFDA Truth Challenge (https://precision.fda.gov/challenges/truth/results) define a "Truth.TP" and a "Query.TP" metric, used to report the quality of variant calls.

I don't understand the definitions, and I'm wondering if anyone can explain and/or point me to where there is a more detailed description of just how these metrics are counted or calculated?

Thanks

next-gen sequence • 2.1k views

ADD COMMENT • link updated 7.0 years ago by jzook • 0 • written 7.0 years ago by bdolin ▴ 100

0

Entering edit mode

~~These terms are used a lot in the clinical setting but not so much in research~~

Consider the following situation:

Truth.TP

We have a DNA sample and we rigorously and exhaustively aim to determine all single nucleotide variants (SNVs) and insertions-deletions (indels) in this sample, when compared to a defined reference genome (can be any, but just needs to be explicitly specified). We use all of the gold standard methods at our disposal in order to determine these variants. Thus, we have almost certainty that, at the end of our work, we know the genomic location of:

each SNV

each indel

each reference base

This is the 'truth' true-positive (Truth TP) dataset because we know each and every variant that is found in it. For now, let's consider that there are exactly 100 SNVs and 10 indels.

Query.TP

Given the same sample that was used above, we now wish to test our next generation sequencing (NGS) analysis method to see how well it fares in discovering all of the known variants that are defined in the truth sample. We come up with a list of SNVs and indels in this sample using our method, and we then query these (i.e. compare them to) against those already defined in the Truth TP. Thus, our list of variants become the 'Query TP', and they are being queried against the Truth TP.

If we are only successful in detecting 97 of the 100 SNVs, then we have 97% sensitivity with our method on SNVs. If we detect just 5 of the 100 indels, then we have just 50% sensitivity with our method on indels.

Further, we may additionally call variants in positions where there has not been a variant reported in the Truth TP. This will then affect the specificity.

Other metrics

The other metrics are all explained on the page at which you have already been looking.

~~I also responded to a previous Biostars question related to this topic: A: Measurement of confidence interval of forest plot for diagnostic Odd Ratio~~

ADD REPLY • link 6.7 years ago by Kevin Blighe 88k

0

Entering edit mode

Thank you Kevin, I think I'm getting it! :-)

Here are definitions from the above sites:

True Positive in Truth (TRUTH.TP): a site in the Truth Call Set for which there are paths through the Query Call Set that are consistent with all of the alleles at this site...

True Positive in Query (QUERY.TP): a site in the Query Call Set for which there are paths through the Truth Call Set that are consistent with all of the alleles at this site...

Recall (METRIC.Recall) (aka, True Positive Rate or Sensitivity): TRUTH.TP/(TRUTH.TP+TRUTH.FN)

Precision (METRIC.Precision) (aka, Positive Predictive Value): QUERY.TP/(QUERY.TP+QUERY.FP)

and here are results from a comparison:

Truth.TP=7749

Query.TP=7984

Truth.FN=2554

Query.FP=10670

Precision=0.428

Recall=0.752

I guess perhaps naively, I would have thought that Truth.TP would equal Query.TP, but I think perhaps what these numbers reflect is that 7984 variants in the query set align with 7749 variants in the truth set - so, sensitivity indicates the rate at which true variants were detected, whereas Precision represents the proportion of true calls in the query set.

Would you say I've stated that correctly?

Thanks again, Bob

ADD REPLY • link updated 7.0 years ago by Kevin Blighe 88k • written 7.0 years ago by bdolin ▴ 100

0

Entering edit mode

~~Hi Bob,~~

Well, those numbers indicate that the query dataset has called much more variants than exist in the truth dataset (10670 false-positives). I believe that you're more or less correct on the definitions of sensitivity and precision, though:

Sensitivity is, of all true variants, what proportion were we able to call?

Precision is, of all variants called in our dataset, what proportion are actually true?

Thus, it's possible to have 100% sensitivity but poor precision.

Note that in my initial answer (above), I haven't alluded to false-negatives in the Truth dataset.

~~Kevin~~

ADD REPLY • link 6.7 years ago by Kevin Blighe 88k

score 0 · Answer 1 · 2017-12-01

0

Entering edit mode

7.0 years ago

jzook • 0

Perhaps to state it a different way, QUERY.TP and TRUTH.TP are the number of query and truth variant calls that match each other, and the reason they can differ is due to differing representations of the same variant. For example, an AC->G complex variant could be represented as a single complex event in the query (so counted once) and as both a C deletion and an A->G SNP in the truth (so counted twice).