As referenced in the pvcf documentation… what are overlapping variants? https://www.biorxiv.org/content/biorxiv/early/2018/06/11/343970.full.pdf
Hybrid allelic representation. To facilitate downstream summary statistics without doublecounting, ideal unified sites would be completely nonoverlapping, with mutually-exclusive alleles.
I understand snv, insertion, deletion, copy number… but what does an overlap look like? I can’t find any descriptive information aside from mentions that “variants can overlap.”
Is this when a single mutation is represented multiple ways by different ordering of alleles?
Thanks Pierre. Please allow me to clarify as I am not all the way there yet.
Guessing your 4th column is ALT and the 5th column is REF.
CAGA represents a 4-nucleotide insertion? And the last A in that insertion is falls at position 13462... where it is being counted a second time as a snp?
Looked at gnomad and dbsnp, but couldn't make much sense of it.
In VCF, column 4 is REF and 5 contains the ALT alleles - while Pierre's example doesn't contain the other columns that would be needed to make it valid VCF, I would assume that is what he is intending.
In which case the first variant (CATA to C) would represent a three base deletion (the ATA are not present in the ALT, and the second (AAGT to A) also represents a three base deletion of AGT. This is a good example that illustrates the difficulties when working with VCF. Notice:
1 13463 rs1441058751 AGTA A
, so there isn't even any syntactic overlap if that representation were used.1 13459 rs1315857414 CAGAAGT CA
, (and conversely, you could represent each of your variants as separate single base deletions)When you are matching and comparing VCF variants you really need to be comparing at the level of the underlying haplotypes rather than at the syntactic level of what is in the REF and ALT fields of the VCF. For this I recommend a tool such as
vcfeval
from RTG Tools (which I help develop).Thanks for the explanation, Len. Seems like computer scientists trying to save a few bits. I'm sure I will be running that tool of yours soon enough.