Question

Gatk'S Unifiedgenotyper And Genotype Determination

0

Entering edit mode

12.7 years ago

DG 7.3k

I'm using the GATK UnifiedGenotyper for calling variants in my exome sequences and for my total pipeline am basically following the Broad's recommended "best practices" as well as the recommended details for exome analysis pipeline that have been posted here and over on SeqAnswers. Looking at some of my variants I am finding some of the Genotype calls a little odd. For instance some variants called as homozygous where there were nearly equal numbers of reads for both the reference and alternative allele and the DP read depth number looks like most of the total reads were kept. (ie. 158 Ref, 123 Alt, DP of 250). Looking at the Phred-Scaled likelihoods the Homozygous alt is of course 0 and heterozygous is at 15, so there is considerable ambiguity of the call going by the likelihoods. I am just wondering if anyone has insight in cases like this why it is likely to have favoured the homozygous alt call.

gatk exome genotyping snp • 4.3k views

ADD COMMENT • link updated 12.7 years ago by matted 7.8k • written 12.7 years ago by DG 7.3k

score 0 · Answer 1 · 2012-08-11

0

Entering edit mode

12.7 years ago

matted 7.8k

Check out their FAQ "Why didn't the Unified Genotyper call my SNP? I can see it right there in IGV!".

Without knowing too much about your specific case, I'd consider their particular suggestions of thinking "What do the mapping qualities look like for the reads with the non-reference bases?" and "What do the base qualities look like for the non-reference bases?".

ADD COMMENT • link 12.7 years ago by matted 7.8k

0

Entering edit mode

It isn't that the SNP isn't called, the SNP is called, it just seems as if the Likelihoods for the genotypes aren't splitting the way I would expect. I am guessing it is a combination of base quality/mapping quality; however in both cases neither was below a threshold for filtering. Only about 20 reads total were filtered, and there aren't a high number of alternate alleles or spanning deletions.

My best guess would be base quality with a very slight bias towards reference alleles. You can see in the Phred-scaled likelihoods that there is quite a bit of uncertainty with a nearly equal split between the homo and heterozygous call.

ADD REPLY • link 12.7 years ago by DG 7.3k