Hi all,
I am trying to extract a subset of sites from a multi-sample vcf file. I only want those sites with at least one homozygous alternative allele call. Some sites are poly-allelic so sites that I want would have at least one 1/1 or 2/2 call.
I've looked through bcftools, vcftools, but haven't found an elegant solution. For example, from the documentation it seems bcftools --genotype can filter for sites with at least one homozygous call, but these could be ref or alt, and I want specifically alt calls.
Any ideas would be greatly appreciated!
You mentioned that "Some sites are poly-allelic." I’d like to ask, how are such sites represented in a VCF file? Does "poly-allelic" correspond to "biallelic sites"?