Dear all,
I have a multisample vcf and want to extract the IDs of those with rare impactful SNPs with at least one alternative allele at the a loci within the region of interest.
I am running the following on a normalised multi-sample vcf which has been annotated with VEP:
bcftools view -r REGION OF INTEREST | filter_vep -filter 'IMPACT is MODERATE or IMPACT is HIGH' | filter_vep -filter MAX_AF <0.01 or not MAX_AF'' | bcftools view -i 'GT="alt"' | bcftools query -f "'[%CHROM:%PO %SAMPLE %GT\n]' >output.txt
I work in an airlocked environment with patient data so can't share the VCF I am using but it is a standard multi-sample vcf created by merging using bcftools.
I get an output with a variant and ID per line irrespective of the genotype e.g.:
chr1:123:456 sampleID123 ./.
chr1:123:456 sampleID234 ./.
chr1:123:456 sampleID156 0/1.
chr1:123:456 sampleID123 ./.
etc
I would like to just get the sample IDs with at least one alt genotype.
Many thanks
I don't understand your problem. In your example above , there is a "sample IDs with at least one alt genotype."
yes the 0/1 variant is the one I want but the output also includes those with two ./. variants. I only want those with at least 0/1.