I have whole genome sequencing data from different sources. And the sequencing depth varies greatly among different samples. And I joint-called SNPs with all samples. When I do SNP filteration after SNP calling, I found that the genotype quality is influenced by sequencing depth. For example, I set GQ >= 30 as my threshold, a genotype that does not meet this threshold is considered a missing genotype. Then after max-missing filteration, I found only very few SNPs are retained, which can not meet the requirements of subsequent analysis. Even if I lowered the threshold to 20, SNPs left are still inadequate. So I calculated the proportion of variants that meet the threshold in each individual. I found that the higher the sequencing depth of the individual, the higher the proportion of variants that met the threshold . How can I fix this problem? Can I just skip the GQ filteration? Does that affect the analysis later on, like population structure analysis, demographic analysis or selective sweep indentification?
see VQSR https://gatk.broadinstitute.org/hc/en-us/articles/360035531612-Variant-Quality-Score-Recalibration-VQSR and https://gatk.broadinstitute.org/hc/en-us/articles/360035890471-Hard-filtering-germline-short-variants
The species I study is not a model species, VQSR can not be applied here. I already used gatk hard-filtering. I mean, after hard-filtering, is it necessary to do genotype quality filteration?
I have this same question as you!