To identify the number of heterozygous variants in my .vcf file, I used the following linux command in vcftools:
$ vcftools --vcf SRR1611183.gatk.vcf --het
And returned the following:
INDV O(HOM) E(HOM) N_SITES F
NA12878 0 0.0 1101 0.00000
This is the same as the 'het_RA_count' => 1101 found in the vcf-stats
feature; however, vcf-stats
also lists 'het_AA_count' => 2, which is not included in the --het
command. If I'm seeking the total number of heterozygous variants should I just total them up (to 1103)? Perhaps het_AA is not truly heterozygous?
I'm open to a bcftools solution if its more straightforward.
Kevin - this is brilliant! Works perfectly, all I needed to do was sum up the 1's in excel.
Great - happy boating (oars... boats...)