Sample Level Filtering During GATK Germline Short Variant Discovery
1
0
Entering edit mode
20 months ago
jon.klonowski ▴ 210

I was looking over GATK4's pipeline for Germline Short Variant Discovery (https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels-) and what was lost on me is where sample level filtering occurs. I was told by someone that I should filter out low quality samples before joint calling with GenotypeGVCFs but I cannot find details of this within GATK's pipeline.

Sample level filter examples:

  • Freemix
  • Other BCFtools stats hard filtering metrics screenshot of unknown origin , unfortunately I forgot where I got this screenshot from.

Some help figuring this out would be amazing. Thank you.

Genotyping genomics vcf GATK germline • 1.3k views
ADD COMMENT
0
Entering edit mode

Check out the Individual based statistics section from this link: https://speciationgenomics.github.io/filtering_vcfs/

Lack of sequencing depth (mean depth) and level of missingness are good ways to remove individuals from a cohort.

ADD REPLY
2
Entering edit mode
20 months ago
dthorbur ★ 2.5k

You could assess the GVCF files emitted from HaplotypeCaller run with an appropriate --annotation-group or --annotation parameter for your filters. Then just don't include failed sampled in the GenomicsDBImport command.

Alternatively, and what I would do, is run the GATK pipeline until the FilterVariants step. Then you can assess how your low quality samples fared. For example, sometimes a lack of depth in one sample can be made up for in joint calling as the questionable SNP was called in high confidence in other samples.

If you think some samples still need to be removed you rerun the pipeline from GenomicsDBImport. In this scenario, you then also can quantify the impact of your "low quality" samples had on the call. Ultimately, running GATK with a small number of relatively poor samples isn't going to cause any damage - variants are first called individually before being merged and reannotated.

ADD COMMENT
0
Entering edit mode

Thank you very much! will look into all of that!

ADD REPLY

Login before adding your answer.

Traffic: 1780 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6