I'm wondering if anyone knows how to filter heterozygous genotype calls from a vcf file where, for either allele, there is only one read supporting the call. So the allelic depth would be n,1 or 1,n. n meaning the depth could be anything. I'm thinking that we are getting a lot of wrong heterozygous calls due to sequencing error. I tried vcffilter using the command:
vcffilter -g "! ( GT = 0/1 & AD = 1,* OR AD = *,1 )"
No luck.
Any one else have any thoughts?
Thanks!
But then, why do you have such a vcf format? Shouldn't you have the depth for each individual allele?
yes I do. The genotype fields are in the following format.
GT:PL:DP:SP:AD:GQ
So a typical genotype data field would look like this:
0/1:20,0,93:4:0:3,1:23
In this case, samtools is calling the position heterozygous for sample because there are 3 reads for the reference allele and one for the variant. I'd like to convert these calls to no call (./.) if there is just one reads supporting either allele.
I hope this makes sense. Thanks again for any help.