I was looking over GATK4's pipeline for Germline Short Variant Discovery (https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels-) and what was lost on me is where sample level filtering occurs. I was told by someone that I should filter out low quality samples before joint calling with GenotypeGVCFs but I cannot find details of this within GATK's pipeline.
Sample level filter examples:
- Freemix
- Other BCFtools stats hard filtering metrics , unfortunately I forgot where I got this screenshot from.
Some help figuring this out would be amazing. Thank you.
Check out the Individual based statistics section from this link: https://speciationgenomics.github.io/filtering_vcfs/
Lack of sequencing depth (mean depth) and level of missingness are good ways to remove individuals from a cohort.