Question

No QD scores on 10% of observations in VCF?

0

Entering edit mode

6.6 years ago

dthorbur ★ 2.6k

I am hard filtering my first VCF, and currently exploring the scores to set my thresholds. I'm basing this mostly on the GATK best practises workflow on their website as that's how I've generated all of my VCFs.

When I extracted QD (Quality by Depth) scores, I noticed 32573 of the 323529 entries were NAs. Upon closer inspection of half a dozen or so, the score was simply not present. in the list. Below are single entries [1] without and [2] with the score;

[1]"AC=1;AF=0.083;AN=12;BaseQRankSum=-1.645e+00;ClippingRankSum=0.00;DP=26;ExcessHet=3.0103;FS=3.979;MLEAC=1;MLEAF=0.083;MQ=57.80;MQRankSum=-5.240e-01;QD=7.31;ReadPosRankSum=-6.740e-01;SOR=0.859"

[2]"AC=10;AF=1.00;AN=10;DP=20;ExcessHet=3.0103;FS=0.000;MLEAC=10;MLEAF=1.00;MQ=60.00;QD=31.81;SOR=1.127"

To generate the VCFs I used GenotypeGVCFs in GATK v4.0.2.1 on a population with 6 individuals, and requested standard annotations. There appears to be a huge discrepancy between the amount of information generated. Is this correct? If not, what would you suggest to do next? It just seems like an awful lot of data to lose before I've even set thresholds.

Thanks in advance.

GATK QD filtering vcf • 1.8k views

ADD COMMENT • link updated 3.6 years ago by PeiwenLi • 0 • written 6.6 years ago by dthorbur ★ 2.6k

0

Entering edit mode

Hello,

if I understood the manual of GenotypeGVCF correct, it needs gVCF files produced by HaplotypeCaller (or CombineGVCFs). How have you done your variant calling?

fin swimmer

ADD REPLY • link 6.6 years ago by finswimmer 16k

0

Entering edit mode

Hi,

Correct, you go through HaplotypeCaller, then GenomicsDBImport, and finally into GenotypeGVCF which produces a normal VCF file with all SNPs and Indels from your samples. This is the current recommended GATK pipeline for variant calling.

See the workflow here:

ADD REPLY • link 6.6 years ago by dthorbur ★ 2.6k

0

Entering edit mode

Hi, I noticed the same pattern in my VCF output from GenotypeGVCFs in GATK v4.1.0.0. Did you manage to fix your issue? Thanks, Peiwen

ADD REPLY • link 3.6 years ago by PeiwenLi • 0