According to GATK best practice, it is recommended that different VQSR models be built for SNPs and INDELs, because the annotations for high-quality SNPs and INDELs are systematically different (if I understand it correctly). Since annotations for good variants on autosomes could be different from those on X chromosome, e.g., DP for good variants on X chromosome could be substantially smaller than DP on autosomes due to having large number of male samples, it seems reasonable to build VQSR models separately for X chromosome and autosomes. However, no such advice is proposed on GATK website.
My question is:
Should we build VQSR models separately for autosomes and X chromosome? If yes, we will have 4 VQSR models: Auto SNPs, Auto INDELs, X SNPs and X INDELs.
Is it for WES or WGS ? If it's for WES it's not adviced to use DP in VQSR
WGS. I guess DP is not the only annotation that differs in X chr and autosomes.
But I am very doubt that only X/Y chromosome is suffercient enough to train the model, maybe we should use hard-filtering on sex chomosomes?
For Y chromosome VQSR could be impossible. At least for hg19, there are no reference true positive variants for Y chromosome, and hence VQSR for Y chr is impossible. For X chromosome, it is a matter of sample size. But if you are doing hard filtering for X/Y chromosomes, what would you suggest as the thresholds for hard filtering? Same as autosomes?