The Concept of Annotation in Comparison to Assessment
===
Working at a company with both clinical and bioinformatic leaders, there is confusion about the concept of “annotation.” I believe that this stems from the fact that annotation can mean one thing in a clinical context and something completely different in a bioinformatics context.
When a bioinformatician says, “I need to annotate a variant,” what they really mean is, “I have a column of variants in my spreadsheet and I need to pull in many additional columns from reference datasets that tell me more about this variant so that I can make a founded assessment.”
Thus, the phrase “reference data” is synonymous with “annotation” in that annotation means supplemental information about any genetic data point that is used to make a decision.
The diagram in the link below demonstrates how Berkeley’s open-source software, Varant, pulls data from various genomic knowledgebases in order to annotate a VCF file. The sources of annotation (on the right) are considered to be integrated into the software. https://imgur.com/KACAU1v
===
Having established this, let’s change gears to examine annotation from a clinical perspective. As demonstrated by the chart below [removed for IP purposes], a clinical geneticist would examine many forms of annotation about a variant before making an assessment as to whether or not it is pathogenic. Taking a look at the first annotation row, you can see that geneticists have indicated that they are in agreement that this variant should be considered “pathogenically strong” according to the information in the _ annotation category.
This is where it gets confusing. The dictionary definition of annotate is, “To add notes to a text in order to give an explanation or comment.” So when a doctor is making an assessment by commenting on the condition of a patient in their medical record, they are performing annotation in the literal sense. Hence, in the clinical context, “making an assessment” is synonymous with “annotating the patient’s condition.” To take this a step further, when clinicians either submit or gather their assessments of variants into a knowledgebase, their assessments become a part of a reference dataset. The flow is: I review annotation, I make an assessment, I submit my assessment to a knowledgebase, the curators of the knowledgebase determine whether or not it is significant enough to be valid, and thus it becomes part of the annotation.
===
Somewhere along the way, as confidence in public reference datasets grew, the term annotation evolved from, “notes about a patient’s condition,” to “reference data” in the vernacular of the greater genomic community.
In conclusion, it is crucial that the community make the distinction between annotation and assessment in not only our literature, but also in our product interfaces.