Dear Biostars community,
I have 70 gvcfs for patients with a rare disease. There are three regions I am interested in and I have merged these vcfs into a multi-sample vcf by these three regions and annotated the file using VEP. Two of the regions are known to be associated with the disease in question and the third region is under investigation.
I want to pick out individuals who ONLY have variants in the region of investigation i.e. if a patient has a variant in either one of the two other "known" regions then they are excluded. The desired output would be a list of sample IDs that I could then subset the VCF into to get a vcf of variants in those people who only have variants in the region of interest.
I'm not sure how to approach this and was wondering if a bcftools method would be helpful? Bcftools isec seems to be variant based rather than ID based....
Many thanks for your help
Thanks - I think I can do this in bcftools which I am more comfortable with but I did not think of this simple approach to the problem.
All the best