Hi all, I've been using samtools for a while to do variant calling. I've some questions about indel vcf output.
Ex.1:
ch12 11542 . GTTTTTTTTT GTTTTTTTTTT 38.5 . INDEL;IS=11,0.647059;DP=17;VDB=6.858248e-02;AF1=1;AC1=2;DP4=0,0,5,6;MQ=60;FQ=-67.5 GT:PL:DP:GQ 1/1:79,33,0:11:62
In this example, the "real" insertion is only one "T", isn't it? Where is the insertion of that "T"? At the end of the variant?GTTTTTTTTTTT, so the insertion position would be 11542+length(GTTTTTTTTTT)-1? Or at the beginning of the variant GTTTTTTTTTT. Looking at IGV
, it seems that the option 2 is the correct one. But, what's the point about putting all the region GTTTTTTTTT->GTTTTTTTTTT, instead of G->GT?
For other hand, take a look to this other variant:
Ex.2:
` ch12 13971470 . TTTATTATTATTATTATTATTATTATTATTATTATTTCTATTATTATTATTATTATTATTTTTATTATTACTATTATTATTATTATTATTATTAT TTTTTATTATTATTATTATTATTATTATTATTATTATTATTTCTATTATTATTATTATTATTATTTTTATTATTACTATTATTATTATTATTATTATTAT 214.0 . INDEL;IS=9,0.450000;DP=20;VDB=1.562057e-01;AF1=1;AC1=2;DP4=0,0,3,15;MQ=55;FQ=-82.5 GT:PL:DP:GQ 1/1:255,48,0:18:93
`
What's is going on here? Which sequence is the "inserted" one? I think this format is very confusing, or I do not quite see the point here.
Any help would be really appreciated.
So that page states:
that seems to indicate that this variant call by samtools is not normalized and that's why it reports the long form.
I can not read the paper, sorry, but thank you Pierre and Istvan, samtools format is a bit confusing. So I think that now I got it.
The "correct" interpretation is this, GTTTTTTTTTT, the insertion happens just after the first base. And in order to check at which nt the insertion ends, I think that a possible way should be to do the following : length(ALT)-length(REF) and add this number to the variant position.
I just want to note here that based on sequence alone we can't tell where the actual insertion took place - technically it could have been anywhere between the Ts. Correct in this case means the normalized representation. It is a little thing but it is important to note that correct in this sense means "normalized".