How to get the information of total number of homozygous and heterozygous non-reference from a multi sample VCF file? As an example, I want the information like if a nonreference allele is B, then how many samples having AB and BB in that VCF file. Basically the number of samples. Is there any software tool available for that?
You can find that out by looking at the GT genotype field in the VCF file.
I recommend using the
VariantAnnotation
R package as you can do it in two lines: 1 to load the VCF, another to parse and summarise the VCF genotypes.@ d-cameron,
Thank you very much for prompt reply, however I am not very familiar with the programming language. Is there any other way to do that, i.e. using available software like VCFtools, BCFtools, GATK. Also I want to add this information for each alternate allele as column in my annovar annotated variant file (in excel)..