I'm writing a python script to convert clustal formatted alignments into VCF files. I'm lost on one thing, how to interpret a gap at the start of an alignment:
ENG1-REF-K ATTTAAGTGAATAGCTTGGCTATCTCACTTCCCCTCGTTCTCTTGCAGAACTTTGATTTT
MERS_EMC_V ---------------------------------------------CAGAACTTTGATTTT
***************
Based on the VCF format, it seems to assume that there is a base upstream of the deletion. E.g. if I have ACGT and A-GT, the VCF file should be REF: AC, ALT: A. The position of the deletion is 2, but the position of the ALT is 1 according to VCF.
http://samtools.github.io/hts-specs/VCFv4.2.pdf
How are terminal deletions considered in VCF?