Hi All,
I am trying to troubleshoot all the missing genotypes in my VCF. I don't quite understand why I get missing genotypes (./.) when there are plenty of reads under AD and DP. I think it's because the GQ also appear to be 0. Can someone confirm this or shed some light on what's going on please.
I copied some lines from the raw genotyped VCF. I used GATK4 Haplotypecaller for variant calling > CombinedGVCFs > GenotypeGVCF
PHUM306900 10716 . C T 210835.07 . AC=435;AF=0.867;AN=502;BaseQRankSum=0.654;DP=13029;ExcessHet=0.0000;FS=0.000;InbreedingCoeff=0.4908;MLEAC=755;MLEAF=1.00;MQ=60.00;MQRankSum=0.00;QD=31.27;ReadPosRankSum=1.10;SOR=0.781 GT:AD:DP:GQ:PGT:PID:PL:PS 1|1:0,52:52:99:1|1:10716_C_T:2252,156,0:10716 0|1:15,10:25:99:0|1:10716_C_T:375,0,600:10716 ./.:19,0:19:.:.:.:0,0,0 ./.:26,0:26:.:.:.:0,0,0 ./.:62,0:62:.:.:.:0,0,0
I should add that I already tried using FixVcfMissingGenotypes and it didn't change any calls on the samples I tried. Any insight would be much appreciated.
show us a few reads overlaping this position using
samtools view
Here are some from one sample. The called genotype for this is below, at position 10716;
And few lines of BAM. The position in question is clearly there. Also, this is a known, well documented mutation associated with insecticide resistance. I am trying to understand why it won't call the genotype and not give ./. Thanks for responding.
most reads have flag=1185 ( ~ PCR duplicates)
I did not realize that PCR duplicates still counts as reads and appear under DP. Thank you for pointing that out. I have about 500 samples and do you have any ideas about how I should verify that PCR duplicates is the reason for missing calls even with DP>10, or do you know something else that might cause this to happen?