Looking a various data in VCF files and places like dbSNP its not uncommon to find different descriptions of the same sequence change. A simple example would be a case were the sequence is ATTTTTG and one T is deleted. Which T? The first or the last?
It gets more complicated of course. The ΔF508 Deletion in CF could not only be written as (hg19).
chr7:117199646 CTTT -> T or chr7:117199646 CTT -> .
But also:
chr7:117199644 ATCT -> A
I'm working on some software for interpretation and comparison of results and am doing internal normalization of these, but I wonder if there is a specific standard I should respect/use.
From the link: "for all descriptions the most 3' position possible is arbitrarily assigned to have been changed, this is important especially in single residue (nucleotide or amino acid) stretches or tandem repeats"