Dear all,
I've been using GATK HaplotypeCaller / GenotypGVFs (v4.2.4.0) for a while but, recently found something strange. There is a position (7063) with 8 reads (3T + 5A) that, even though HaplotyCaller calls as a HET (see image, lower track):
NC_046966.1 7063 . T A,<NON_REF> 177.64 . BaseQRankSum=0.887;DP=8;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=2.369;RAW_MQandDP=16885,8;ReadPosRankSum=1.345 GT:AD:DP:GQ:PL:SB 0/1:3,5,0:8:89:185,0,89,194,104,298:0,3,1,4
GenotypeGVFs change it to a HOM_ALT (first individual):
NC_046966.1 7063 . T A 2993.02 . AC=20;AF=0.185;AN=108;BaseQRankSum=0;DP=268;ExcessHet=0;FS=0;InbreedingCoeff=0.4814;MLEAC=25;MLEAF=0.231;MQ=44.67;MQRankSum=0.967;QD=27.24;ReadPosRankSum=1.38;SOR=0.754 GT:AD:DP:GQ:PGT:PID:PL:PS 1|1:0,5:5:15:1|1:7054_T_C:222,15,0:7054 0/0:1,0:1:3:.:.:0,3,15:. 0/0:0,0:0:0:.:.:0,0,0:. 0|1:2,6:8:59:0|1:7054_T_C:243,0,59:7054 0/0:0,0:0:0:.:.:0,0,0:. 0/0:1,0:1:3:.:.:0,3,30:. 1|1:0,9:9:27:1|1:7054_T_C:402,27,0:7054 0/0:6,0:6:18:.:.:0,18,239:. 1|1:0,4:4:12:1|1:7054_T_C:174,12,0:7054 0/0:16,0:16:48:.:.:0,48,656:. 1|1:0,7:7:21:1|1:7054_T_C:315,21,0:7054 1|1:0,7:7:21:1|1:7054_T_C:315,21,0:7054 0/0:7,0:7:21:.:.:0,21,228:. 1|1:0,14:14:42:1|1:7054_T_C:630,42,0:7054 0/0:0,0:0:0:.:.:0,0,0:. 0/0:2,0:2:6:.:.:0,6,57:. 0/0:5,0:5:15:.:.:0,15,156:. 0/0:3,0:3:0:.:.:0,0,0:. 0|1:2,6:8:66:0|1:7054_T_C:231,0,66:7054 0/0:6,0:6:0:.:.:0,0,155:. 0/0:8,0:8:24:.:.:0,24,272:. 0/0:4,0:4:12:.:.:0,12,129:. 0/0:5,0:5:15:.:.:0,15,185:. 0/0:4,0:4:12:.:.:0,12,141:. 0/0:3,0:3:9:.:.:0,9,113:. 0/1:2,4:6:65:.:.:137,0,65:. 0/0:3,0:3:6:.:.:0,6,90:. 0/0:9,0:9:27:.:.:0,27,338:. 0/0:9,0:9:27:.:.:0,27,350:. 0/0:9,0:9:27:.:.:0,27,338:. 0/0:4,0:4:12:.:.:0,12,102:. 0/0:11,0:11:33:.:.:0,33,433:. 1|1:0,4:4:12:1|1:7054_T_C:180,12,0:7054 0/0:3,0:3:0:.:.:0,0,0:. 0/0:1,0:1:3:.:.:0,3,30:. 0/0:6,0:6:0:.:.:0,0,167:. 0/0:4,0:4:0:.:.:0,0,0:. 0/0:2,0:2:6:.:.:0,6,57:. 0/0:1,0:1:3:.:.:0,3,30:. 1|1:0,3:3:9:1|1:7054_T_C:135,9,0:7054 0/0:1,0:1:0:.:.:0,0,0:. 0/0:2,0:2:6:.:.:0,6,31:. 0/0:5,0:5:0:.:.:0,0,152:. 0/0:0,0:0:0:.:.:0,0,0:. 0/0:4,0:4:12:.:.:0,12,141:. 0/0:11,0:11:33:.:.:0,33,423:. 0/0:4,0:4:12:.:.:0,12,129:. 0/1:2,2:4:71:.:.:71,0,71:. 0/0:2,0:2:6:.:.:0,6,45:. 0/0:3,0:3:9:.:.:0,9,87:. 0/0:7,0:7:21:.:.:0,21,240:. 0/0:6,0:6:18:.:.:0,18,224:. 0/0:1,0:1:3:.:.:0,3,42:. 0/0:4,0:4:12:.:.:0,12,129:.
I looked into the assembled output of Haplotype Caller (see image, first track) and there are plenty of haplotypes supporting both alleles, but actually more supporting the reference allele (8T + 4A)! It seems that GenotypeGVCFs ignores 8 REF haplotypes and calls a HOM_ALT based on 4 haplotypes, while reporting there are 5! On top of that, the ALT allele is also present in other samples but the freq is not that high (0.19).
Any idea of what might be going on?
thanks in advance,