Benchmarking small variant calls: TP, FP, TN, FN
1
1
Entering edit mode
5.5 years ago
Tails ▴ 80

I'm reading this excellent paper which describes a methodology to standardize variant benchmarking process. They say that the normal binary classification form (i.e., TP, FP, FN, and statistics derived from these) are not simple for variant calls. So they go on to describe how they do this, in tabular form.

It's also worth noting that this is how they define sensitivity and specificity:

sensitivity (the ability to detect variants that are known to be present or “absence of false negatives”) and specificity (the ability to correctly identify the absence of variants or “absence of false positives”)

They do not use specificity, opting instead for precision because "precision is often a more useful metric than specificity due to the very large proportion of true negative positions in the genome."

contingency table

I'm having a hard time understanding why FP is the way it is. I would've thought that all the n/a's in the first column (e,g, "ref/var2" and ref/var3", "var1/var3" etc) would be FP as well.

It's also hard to decipher why the n/a's occur in the rest of the table? This may have something to do with their comment:

"Note that we have chosen not to include true negatives (or consequently specificity) in our standardized definitions. This is due to the challenge in defining the number of true negatives, particularly for indels or around complex variants."

Are the n/a's representing true negatives? It is all so confusing.

genome sequencing SNP • 2.3k views
ADD COMMENT
0
Entering edit mode

The work is overly complicated, and incomplete. I read this pre-print in depth and came up with many points of critique. In this regards, it is important to point out, too, that it is not yet published (to my knowledge). The author list neither includes some important names who have been working in this field for many years, so, it is not holistic and will not be adopted widely by the community. Apart from anything else, the work somewhat represents a shameless plug for the tools of the authors.

ADD REPLY
1
Entering edit mode

Interesting, I would curious to read what your critiques are. What other tools out there are good? I'm still looking at this problem months later.

ADD REPLY
0
Entering edit mode

I suppose that it could have included, e.g., Heidi Rehm, and at least one representative from Ambry Genetics, a company that has been working in the clinical genetics space for many years. Ambry has published an eye-opening paper regarding the detection of genetic variants by NGS in a clinical context. Representatives from other regulatory bodies such as the NGRL (Manchester, UK), NEQAS, and other national bodies across Europe could also have been included.

It has since come to my attention that the work has indeed been published, but I am afraid that I still cannot see it being widely adopted because it has in no way been a community effort. The conflicts of interest are quite apparent, with representatives from Illumina on the paper, and also the owner / developer of RTG in NZ.

Finally, the work genuinely comes across as overly complicated and convoluted.

Frankly, and not directly pointed at the authors here, it is scary the way that people in positions of authority are actively tolerating error from Illumina's SBS NGS sequencers, and pushing this into live genetic testing laboratories, even suggesting that no validation of the findings is required, in some cases.

I will admit that I have not yet seen the peer reviewed work, so, perhaps things have changed since the pre-print.

ADD REPLY
4
Entering edit mode
5.5 years ago

It's also hard to decipher why the n/a's occur in the rest of the table?

I think it is because those scenarios can't happen. If you compar two individuals, and the Query has var1/var3 then there has to be at least a var2 in your comparison, otherwise you can't have var3.

If Query = GT:1/3 then the only scenarios for Truth are GT:1:2, in a vcf file.

ADD COMMENT

Login before adding your answer.

Traffic: 1665 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6