Dear Community,
I would like to count the total number of heterozygous and homozygous SNP's in my VCF file. I have read up on other forums but I can't seem to find an answer/guidance on how to perform this.
I am simply wondering if there is a simple way to calculate total heterozygous SNPs and total homozygous SNPs for a given VCF file. For example, I am working with 7 parasite genomes that I have mapped and called variants using GATK. For parasite_1.vcf I would like to know what percentage of SNPs called are homozygous or heterozygous. I would like to summarize this information in a table like below.
Strain SNPs Homozygous Heterozygous
strain1 11,091 7,857 3,234
strain2 10,772 6,355 4,367
etc.
With SNPs being total SNP count, Homozygous being total homozygous SNP count and Heterozygous being total heterozygous SNP count, for a given strain VCF file.
Any guidance or advice is greatly appreciated.
see how to estimate heterozygosity by sample from a multi-sample vcf file