In the new version of samtools
, additional fields to INFO
column can be added such as AD
, ADF
and ADR
:
- AD: Total allelic depth
- ADF: Total allelic depths on the forward strand
- ADR: Total allelic depths on the reverse strand
According to the manual, each of the mentioned tags have the following format (for ADR for example): Number of high-quality ref-forward, ref-reverse, alt-forward and alt-reverse bases. Everything is OK.
My question is: Why sometimes (for some variants), I'm getting only 3 of the 4 counters? Ex.:
> chr1 52 . C G 19.4215 . DP=5;VDB=0.1;SGB=-0.453602;RPB=0;MQB=0.75;BQB=1;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=2,0,2,0;MQ=45 GT:PL:DP:ADF:ADR:AD:GQ 0/1:52,0,67:4:2,2,0:0,0,0:2,2,0:54
> chr1 141 . T A 164 . DP=32;VDB=2.5082e-05;SGB=-0.670168;RPB=0.958048;MQB=0.92664;MQSB=0.960728;BQB=0.79189;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=6,1,7,3;MQ=44 GT:PL:DP:ADF:ADR:AD:GQ 0/1:197,0,151:17:6,7,0,0:1,1,2,0:7,8,2,0:127
Here you can see it in clearer:
GT:PL:DP:ADF:ADR:AD
:GQ
0/1:52,0,67:4:2,2,0:0,0,0:2,2,0
:54
0/1:197,0,151:17:6,7,0,0:1,1,2,0:7,8,2,0
:127
EDIT: For some variants I'm seeing also only 2 counters. Any clue?
EDIT_2: Reading a little bit more... I've found this definition: AD=Allelic depths for the ref and alt alleles in the order listed. So, I interpret that there are as many AD values as there are alleles, but... as can be observed in the above example, in both cases I've only 2 alleles (one REF and one ALT).
How did you obtain your data? Did you use something like
vcf_parser file.vcf --split
?This sound like a good question, anybody have an answer to that ?