I am generating some VCFs from WES. I ran the BAMs through GATK standard workflow with no problem, but then realized that sites that are not represented in my final VCF could be:
A. Filtered out because of low quality or
B filtered out because they are non-variant
My original plan was re-running GATK to preserve non-variant sites at each step. This was quite a battle which I ultimately lost. Are there standard approaches to determine if sites were removed during calling because everyone was homozygous ref or because of low quality?
I am struggling quite a bit on this and to me it seems to be standard thing someone would want to know so I am worried I am missing something obvious
I'm sorry, what does that mean?
oops sorry, sites could be filtered out because non-variant/ all homozygous reference. If I understand correctly these sites are typically removed while running GATK
Yes, sites that are hom-ref for all samples are not included in the VCF file. They are included as blocks in gVCF files though. You'll need to look at your pipeline for hard filters. GATK usually uses soft filters to mark entries as
PASS
or<FILTER_NAME>
to denote if a variant passed a QC filter, unless a parameter was used that hard-filters variants.