Hi, in my master thesis I have to implement tool for detecting genome variations. Now, I am learning how to represent structural variations in VCF file. But after reading example in http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41 I am little bit confused.
Example says:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001
1 2827693 . CCGTGGATGCGGGGACCCGCATCCCCTCTCCCTTCACAGCTGAGTGACCCACATCCCCTCTCCCCTCGCA C . PASS SVTYPE=DEL;END=2827680;BKPTID=Pindel_LCS_D1099159;HOMLEN=1;HOMSEQ=C;SVLEN=-66 GT:GQ 1/1:13.9
Length of reference sequence is 70bp (69bp if we remove first base which doesn't belong to deletion), but SVLEN here is 66bp and also if we subtract END with POS, we get -13. Shouldn't SVLEN be 69 and END position 2827762? Or maybe I'm missing someting here.
Thanks for clarifying me this.
Congratulations on having the same master thesis project as me! :-)