So I am using GATK's unified genotyper to call variants on some of my next gen data. I am using a haploid organism, and used the -ploidy flag to indicate that to unified genotyper. The issue that I am having is that when you say that you have a haploid organism when a variant has 52% of reads supporting an alternate allele and 48% supporting the reference, the call goes to the majority instead of being annotated as a heterozygote. Therefore, in regions that may be duplicated you can get calls in either direction depending on read depth. Additionally, these variants are only reported when the alternate allele is in the majority, when the reference is in the majority the variant is not called at all.
For Example:
II 4533636 . C T 943 GT:AD:DP:GQ:MLPSAC:MLPSAF:PL 1:46,69:115:99:1:1.00:973,0
In this case 69 reads supported the alternate allele so the variant was called. In another line the read depth was in the opposite direction according to IGV and was not called. I was wondering if there was a way in Unified Genotyper to increase the percentage of reads needed to call a variant so I could filter out these false SNPs.
You can write a filtering script that will take your vcf file and flag these variants as "FAIL". This is what I do normally for SNPs that could be false positives.
Ok, yeah thats a good idea, I was just wondering if there was a native parameter I could set that isn't obvious in the documentation.