I am using HaplotypeCaller for calling the variants for 120 gene based target sequence. For gene PMS2, there is a variant with coverage 21 (in that position) Allele fraction of the alternate allele is 5 reads ( 24%). The mapping quality of the reads are mostly 0 , I geuss because of the very similar pseudogene PMS2CL (mapping done with hg19 using BWA-MEM). This variant is not getting called by haplotypecaller, but is actually a true variant (found with sanger sequencing). I compared the BAM file with bamout file, both are similar. I also tried mapping the sequence with custom reference sequence (target region based). when I used that bam file for variant calling, It called that specific variant (though it also increased the coverage depth and increased the number of variants many fold, which are false positives).
I wonder what can be the possible explanation to this. what is the cutoff criteria, which haplotypecaller is using in this case? why the variant is not getting called at first place?
Here is the link to screenshot of a PMS2 variant with coverage 21 (also atached as file) https://drive.google.com/file/d/0Bwibh75M75p_bGJrNlpyRTVSNHVZRDMzUFB0UDFOV2gyM2Rj/view?usp=sharing
I'm not seeing 21 reads in that screen shot - however, the reason is most likely due to "The mapping quality of the reads are mostly 0". These reads are probably filtered out before they are used for SNP calling because we cannot be confident about what they are telling us.
I don't know what else to say. If you want to call SNPs, you're going to need more high-quality data - or even better, more individuals known to have the same interesting genotype. Good luck! :)
http://gatkforums.broadinstitute.org/gatk/discussion/3131/determination-of-heterozygous-and-homozygous-calls
Im sorry, I don't understand - what did you want me to read here?