Question

Should we trust genotypes called in simple tandem repeat regions?

0

Entering edit mode

3.1 years ago

samuelandjw ▴ 260

Hello. I am searching genomes (WGS) or exomes (WES) of patients with rare diseases for potential disease-causing variants. The accuracy of each genotype for each patient is vital. I'm using GATK 4 to perform joint-calling of genotypes of the patient cohort. I filter out genotypes with low DP and low GQ (by setting genotypes to missing). I noticed that some called genotypes were located in simple tandem repeat regions (according to repeat masker).

Given that NGS is not going well with repeat regions in the genomes, genotypes called in those regions are supposed to be with lower quality, but I do not know what to do with them? Some sources suggest filtering out in-frame indels called in those regions but retaining the rest. How about SNPs and frameshift indels? What will be the rationale for retaining or filtering certain types of variants in simple tandem repeat regions?

sequencing GATK WGS WES • 661 views

ADD COMMENT • link updated 3.1 years ago by Istvan Albert 102k • written 3.1 years ago by samuelandjw ▴ 260

score 1 · Answer 1 · 2021-11-11

The problem in low complexity regions is that the alignments themselves may fundamentally incorrect, thus it can be extremely challenging to determine which variant is present from short reads alone.

In the paper that you cite they state:

In all 35 cases, the single nucleotide variant (SNV) was confirmed by Sanger sequencing.

In the end, perhaps that is the only way to know for sure, particularly when you are detecting a novel variant that falls into a low complexity region.