I have a multi-sample VCF file (say comprising of 5 samples) created by GATK HaplotypeCaller. The 'FORMAT' field of each sample contains GT:AD:DP:GQ:PL values. Now I want to calculate mean GQ value for all the five samples, so that I may filter VCF file based on average/commulative GQ value.
FORMAT field of the vcf file: GT:AD:DP:GQ:PL 1/1:0,21:21:63:736,63,0 0/0:3,0:3:9:0,9,84In concordance to first question, what is more suitable to filter vcf based on average GQ or commulative GQ?
Thanks.
This is what I have been looking for. I applied it to my work and it went well. I have chrom, pos, and sample with their respective GQ values. How do I get the average across all the samples for each site (i.e chrom and pos). I need the average so that I can plot it in R.