Hi,
I want to left-normalize (5') all genomic variants in my pipeline. But something occurred for the 1:17371287 GAGGT/- variant. If I use this VCF as input:
#CHROM POS ID REF ALT QUAL FILTER INFO
1 17371286 1:17371287_GAGGT/- TGAGGT T . PASS
The output for both vt normalization and bcftools norm is
1 17371285 1:17371287_GAGGT/- ATGAGG A
That is, the variant has been moved 1 pos to the left. However, if we check reference, we see there is no repeat pattern to justify that shift:
It seems that the input VCF, TGAGGT / T, is ambiguous and makes both normalizers consider that the deletion is from the first T to the G (TGAGG) instead of from the G to the last T (GAGGT). Well, I tried to use a more exhaustive variant description as VCF input:
1 17371283 1:17371287_GAGGT/- ATATGAGGTTTGTCT ATATTTGTCT
However, the result is the same, the variant is again moved to the left:
1 17371285 1:17371287_GAGGT/- ATGAGG A
Am I missing something? Any help would be very welcomed :)
Note: Websites refer to rs786202100 indel with both coordinates: 1:17371286-17371290 and 1:17371287-17371291 (ex1, ex2), which makes all a bit more confusing.