Dear Biostars,
I have 80,000 people in a multi-sample VCF which is chunked into roughly 1500 files that span the human genome annotated with VEP including the LOFTEE tool.
I would like to count per chunk the number of High Confidence (annotated as "HC") loss of function variants (LoF) there are per person (either her or hom) per chunk and then add these up so I can see how many HC LoF variants there are per person.
I was initially just using grep to pull out "HC", running bcftools view with the "alt" command and then converting these chunked files into plink files to use it's count function but I was wondering if there was a clean way of doing this directly from a VCF without intermediate steps?
Many thanks for your help
(Similar question to: Count Of Variants)
would the Expression by something like "HC", how would I define that? An example line with a HC call is below:
With the corresponding part of the VCF being:
something like:
-i 'INFO/CSQ ~ "|HC"'