In whole exome calls you only ever get results for positions that are variant when compared to the reference - invariant or wild-type positions are only inferred. This is fine when only looking at one sample. However, when comparing lots of samples to each other - either as part of a trio study or larger cohort analysis - it is not immediately obvious that a non-variant position is due to no data or a truly reference allele call.
This causes a problem where, say, a patient with a condition has a mutant allele at a position, but their parents have no calls at that position. That could either be because they are both WT for that allele, and the patient has a spontaneous mutation, or they have low/no read coverage at that position and so there is no call. In the latter case it still looks like the patient has a spontaneous mutation, but it could be that one or both parents have the allele and thus is a false-positive candidate mutation in the patient.
The only way I see of resolving this is to go back to the BAM files and check all invariant calls, where there is a candidate mutation in the patient, for read depth at that position.
Is anyone else looking at this? Or are there tools that do this already? Or do people just rely on confirmation with Sanger seq?
I'd be grateful for any comments or suggestions. Thks
Thanks for the reply.
Could you elaborate a bit on what you mean by 'well characterised reference positions', please? Do you mean known genotypes at cetain loci for all three samples? Or something else?
Known genotypes of good quality and coverage was what I was driving at, regardless of actual genotype, and yes in all 3 samples.