I'm using the GATK UnifiedGenotyper for calling variants in my exome sequences and for my total pipeline am basically following the Broad's recommended "best practices" as well as the recommended details for exome analysis pipeline that have been posted here and over on SeqAnswers. Looking at some of my variants I am finding some of the Genotype calls a little odd. For instance some variants called as homozygous where there were nearly equal numbers of reads for both the reference and alternative allele and the DP read depth number looks like most of the total reads were kept. (ie. 158 Ref, 123 Alt, DP of 250). Looking at the Phred-Scaled likelihoods the Homozygous alt is of course 0 and heterozygous is at 15, so there is considerable ambiguity of the call going by the likelihoods. I am just wondering if anyone has insight in cases like this why it is likely to have favoured the homozygous alt call.
It isn't that the SNP isn't called, the SNP is called, it just seems as if the Likelihoods for the genotypes aren't splitting the way I would expect. I am guessing it is a combination of base quality/mapping quality; however in both cases neither was below a threshold for filtering. Only about 20 reads total were filtered, and there aren't a high number of alternate alleles or spanning deletions.
My best guess would be base quality with a very slight bias towards reference alleles. You can see in the Phred-scaled likelihoods that there is quite a bit of uncertainty with a nearly equal split between the homo and heterozygous call.