Hello. I am searching genomes (WGS) or exomes (WES) of patients with rare diseases for potential disease-causing variants. The accuracy of each genotype for each patient is vital. I'm using GATK 4 to perform joint-calling of genotypes of the patient cohort. I filter out genotypes with low DP and low GQ (by setting genotypes to missing). I noticed that some called genotypes were located in simple tandem repeat regions (according to repeat masker).
Given that NGS is not going well with repeat regions in the genomes, genotypes called in those regions are supposed to be with lower quality, but I do not know what to do with them? Some sources suggest filtering out in-frame indels called in those regions but retaining the rest. How about SNPs and frameshift indels? What will be the rationale for retaining or filtering certain types of variants in simple tandem repeat regions?