Hi all,
We used multiplex PCR to enrich the target regions, and then get them sequenced on HiSeq platform.
For germline variants, it will be ideally that the ratio for ref allele against alt is around 0.5 for heterozygous variants.
However, in our data, we find that sometimes, this ratio is less than 0.1 according to GATK calling result.
I am wondering why this could be happen for germline variants?
And the most confusing thing is that we find calling results differently but slightly difference for this ratio, etc., 0.06 for homozygous but 0.07 for heterozygous.
It will be great if you can give me some suggestions.
Finding a solution to such a problem during genotype assignment and variant calling is not very easy as it depends on a variety of factors such as sequence and mapping errors that any variant calling software takes into consideration. Some of them can be taken care using methods mentioned by Kevin.. but further filtering can be also be done using genotype quality making it stricter(although haplotypecaller itself applies it by default). And most of these quality values are phred scaled likelihood values..so it's an estimation what the tool is making about the genotype again taking sequencing and mapping errors into consideration..and it's the best it can estimate based on the sequencing data..
You can also try gatk's genotype refinement tool to refine your assigned genotypes if you have a truth set for the kind of data you are exploring..
Thank,
Yes, I indeed find it is tough to find "truth" calling when the allelic unbalance came out.
GATK refinement workflow require a truth set, such as trio/pedegree data, as the prior knowledge to adjust the variant calling. However, our data is based on sporadic population, and the adjustment without any dataset makes even worse at sometimes.
We also applied 1KG dataset as the truth, and similar results were found.
So we believe refinement will not work fine if no trio/pedegree data are offerred.