Hi,
I have sequenced (WES) some paired blood and saliva samples along with NA12878 standard sample. I have used a bioinformatics pipeline to generate raw VCF for all samples and standard. The accuracy, recall, F1-score are over 99% for SNPs however for INDELs it is worse (just 97%). I would like to improve results and I decided to use GATK VQSR. Is it a right approach to use NA12878 standard dataset (GIAB) and NA12878 (sequenced) sample in VariantRecalibrator (to build the model) and then use this model to filter my samples. Should I provide both sequenced and standard callset into resource field and use one of my samples or I need to use sequenced NA12878 under -V option and put only NA12878 standard into resource?
Thanks for the answer in advance,