Greetings,
GA4GH benchmarking performance metrics (https://github.com/ga4gh/benchmarking-tools/blob/update-perf-metrics-definitions/doc/standards/GA4GHBenchmarkingPerformanceMetricsDefinitions.md) and PrecisionFDA Truth Challenge (https://precision.fda.gov/challenges/truth/results) define a "Truth.TP" and a "Query.TP" metric, used to report the quality of variant calls.
I don't understand the definitions, and I'm wondering if anyone can explain and/or point me to where there is a more detailed description of just how these metrics are counted or calculated?
Thanks
These terms are used a lot in the clinical setting but not so much in researchConsider the following situation:
Truth.TP
We have a DNA sample and we rigorously and exhaustively aim to determine all single nucleotide variants (SNVs) and insertions-deletions (indels) in this sample, when compared to a defined reference genome (can be any, but just needs to be explicitly specified). We use all of the gold standard methods at our disposal in order to determine these variants. Thus, we have almost certainty that, at the end of our work, we know the genomic location of:
This is the 'truth' true-positive (Truth TP) dataset because we know each and every variant that is found in it. For now, let's consider that there are exactly 100 SNVs and 10 indels.
Query.TP
Given the same sample that was used above, we now wish to test our next generation sequencing (NGS) analysis method to see how well it fares in discovering all of the known variants that are defined in the truth sample. We come up with a list of SNVs and indels in this sample using our method, and we then query these (i.e. compare them to) against those already defined in the Truth TP. Thus, our list of variants become the 'Query TP', and they are being queried against the Truth TP.
If we are only successful in detecting 97 of the 100 SNVs, then we have 97% sensitivity with our method on SNVs. If we detect just 5 of the 100 indels, then we have just 50% sensitivity with our method on indels.
Further, we may additionally call variants in positions where there has not been a variant reported in the Truth TP. This will then affect the specificity.
Other metrics
The other metrics are all explained on the page at which you have already been looking.
I also responded to a previous Biostars question related to this topic: A: Measurement of confidence interval of forest plot for diagnostic Odd RatioThank you Kevin, I think I'm getting it! :-)
Here are definitions from the above sites:
True Positive in Truth (TRUTH.TP): a site in the Truth Call Set for which there are paths through the Query Call Set that are consistent with all of the alleles at this site...
True Positive in Query (QUERY.TP): a site in the Query Call Set for which there are paths through the Truth Call Set that are consistent with all of the alleles at this site...
Recall (METRIC.Recall) (aka, True Positive Rate or Sensitivity): TRUTH.TP/(TRUTH.TP+TRUTH.FN)
Precision (METRIC.Precision) (aka, Positive Predictive Value): QUERY.TP/(QUERY.TP+QUERY.FP)
and here are results from a comparison:
I guess perhaps naively, I would have thought that Truth.TP would equal Query.TP, but I think perhaps what these numbers reflect is that 7984 variants in the query set align with 7749 variants in the truth set - so, sensitivity indicates the rate at which true variants were detected, whereas Precision represents the proportion of true calls in the query set.
Would you say I've stated that correctly?
Thanks again, Bob
Hi Bob,Well, those numbers indicate that the query dataset has called much more variants than exist in the truth dataset (10670 false-positives). I believe that you're more or less correct on the definitions of sensitivity and precision, though:
Thus, it's possible to have 100% sensitivity but poor precision.
Note that in my initial answer (above), I haven't alluded to false-negatives in the Truth dataset.
Kevin