I am trying to identify controls in a VCF containing 10K samples and 40 variants. these controls should not carry any alternate alleles at any of the 40 variants.
based on the snpsift docs it look like I should be able to do something like this to keep sample homozygous reference for all variants in
cat $sample.vcf | java -jar SnpSift.jar filter "isHom( GEN[*] ) & & isRef( GEN[*] )"
I still get 10K samples output, which I know should not be the case as some sample have alternate alleles for at least one of the 40 variants
Ive tried to filter a VCF by sample with snpsift before, but have never been able to crack it. I can do this in python but usually avoid custom coding VCF stuff if possible. any help is greatly appreciated.
the tool filters variants, not samples.