From 1000 genomes vcf:
1 191160243 rs68092106 TC T 100 PASS AC=1393;AF=0.278155
1 191160244 rs10801031 C T 100 PASS AC=1022;AF=0.204073
Seems a little suspicious that these have the same freq. Are these two rows reporting the same variant? Or can callers really distinguish between an indel and a SNP that perfectly overlaps the beginning or end of an indel?
TOPMed reports the similar results to 1000 genomes
https://bravo.sph.umich.edu/freeze5/hg38/variant/1-191191114-C-T
https://bravo.sph.umich.edu/freeze5/hg38/variant/1-191191113-TC-T
So I would guess they really are two different variants, unless this is a known issue with variant calling. I mean as long as it is sequence based I would think they could right?
Thanks, that makes sense
Actually this raises raises another question, I came across this example:
It seems the deletion would occur in ~ 62 % of haplotypes, removing the C at position. It seems the SNV occurs in ~ 59% of haplotypes.
Considering the context: https://www.ncbi.nlm.nih.gov/genome/gdv/browser/genome/?id=GCF_000001405.25, I am not sure how you can have the described SNV ~59% of the time if the base it depends on is deleted ~62% of the time. Sorry feel a bit silly about this one.
They don't have to be part of the same 100% - the context is in the number of samples/chromosomes, not the total number in the cohort. The deletion could be 62% among 200 chromosomes and the 57% could be among a totally different or slightly overlapping 300 chromosomes. You'd need to look at this in each individual to see if there are any changes that don't make sense, like a hom-alt deletion AND a het SNV in the same diploid person.
Not to mention that gnomAD has wildly different frequencies on them:
This really helped it click, thank you again for taking the time to respond to this question and all the others you have contributed to on biostars.
No problem, happy to help!