Dear Biostars,
I have a NGS data for thousands of people merged into a multisample vcf which has been split into genomic chunks (roughly 1000 chunks). These have been annotated with VEP. One of those annotations is the LOFTEE "high confidence" marker.
I want to count per sample the number of LOFTEE "HC" hits there are per sample across all chunks so the final output would simple be the sample ID and a number representing the total number of high confidence markers that person has.
I have tried:
bcftools view -S $sample_list | \
bcftools +split-vep \
-d \
-f '%CHROM\t%POS\t%REF\t%ALT\t%SYMBOL\t%Feature\t%Consequence\t%Existing_variation\t%LoF\n' \
-i "LOF='HC'" \
-o output_${i}.txt
where $sample file is an array over all the chunks but I am not getting it per sample ID.
Any help would be greatly appreciated.
Jorge Amigo thank you so much for taking the time to reply! You are right, this is a much better strategy. Will set it running and let you know if any troubles (its 88000 people in a chunked multisample vcf file).
All the best