Question

Very different QUAL scores on VCFs of same sample using different methods

1

Entering edit mode

17 months ago

Victor ▴ 20

Hello,

I produced a few VCFs, using clara parabricks deepvariant and gatk haplotype caller (in the regular way and also using clara-parabricks haplotype caller, which yielded identical results)

The problem is that the QUAL score of both different extremely; using haplotype caller a lot of variants were in the thousands, while using deepvariant most were in the 30-50s.

Here is an example of a variant we are interested in:

chr5    112839942   .   C   T   2002.64 .   AC=1;AF=0.500;AN=2;BaseQRankSum=0.126;DP=187;ExcessHet=3.0103;FS=2.554;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=10.94;ReadPosRankSum=-3.706;SOR=0.775    GT:AD:DP:GQ:PL  0/1:91,92:183:99:2010,0,1931

the example above was made using haplotype caller, while the one below was made using deepvariant.

chr5    112839942   .   C   T   35.1    PASS    .   GT:GQ:DP:AD:VAF:PL  0/1:34:184:92,92:0.5:35,0,41

I am aware that the later example is filtered and is lacking a column, but I'm wondering how there is such massive difference on the quality scores of both, if anyone could give me a clue I'd be very thankful!

thanks for your time.

deepvariant clara-parabricks haplotypecaller vcf • 1.4k views

ADD COMMENT • link 17 months ago by Victor ▴ 20

2

Entering edit mode

I'm afraid the method used to calculate GQ is not defined in the VCF spec. It's up to the caller to produce a value.

ADD REPLY • link 17 months ago by Pierre Lindenbaum 164k

0

Entering edit mode

Hi Victor, A few questions to try to diagnose: 1) was the pre-processing of both prior to VCF generation identical? 2) what settings were used to run these samples? the exact commands issued might help. 3) anything else we should know? e.g. in one case the sample was jointly called, in the other case, it was called singly.

ADD REPLY • link 17 months ago by LauferVA 4.5k

0

Entering edit mode

Hello,

the treatment of both was the same, both of those were generated from the same exact .bam file (made with fq2bam) of a wes sequencing with parabricks. one using clara-parabricks deepvariant and the other also using parabricks haplotype caller, we also tried using both regular gatk haplotype caller and parabricks germline (which generates bams and vcfs also using haplotypecaller). All the methods using haplotype caller had the same results.

With the exception of deepvariant that used the --use-wes-model flag, everything else was the default. each sample was called individually

thanks for the help!

ADD REPLY • link 17 months ago by Victor ▴ 20

1

Entering edit mode

So, I've not used these tools or read the docs in detail, so please critically evaluate this...

Having said that, check out the haplotype caller page, which states:

This tool applies an accelerated GATK CollectMultipleMetrics for assessing the metrics of a BAM file, such as including alignment success, quality score distributions, GC bias, and sequencing artifacts. This functions as a ‘meta-metrics’ tool, and can run any combination of the available metrics tools in GATK to assess overall how well a sequencing run has performed. The available metrics tools (PROGRAMs) can be found in the command line example below.

Overall, this seems to me to be saying that the niche application of the haplotype caller tool is to generate sample metadata. That the gatk and nvidia implementation of gatk issue similar/identical results is not surprising (if anything is probably a bit reassuring). I would double check what nvidia's workflow is doing, it may just be calling the appropriate commands from gatk...

It seems haplotypecaller could be used to optimize/maximize variant call accuracy, but for that you would want to include the BQSR report from another variant caller. Overall, it doesnt seem to me that haplotype caller is really meant to be a dedicated variant caller per se, but rather is meant to help you understand the quality assurance metrics associated with a .BAM file.

Could probably check with support from nvidia itself to confirm/disconfirm these ideas, if no one else weighs in here.

ADD REPLY • link 17 months ago by LauferVA 4.5k

0

Entering edit mode

What is the sequencing depth of the sample?

ADD REPLY • link 17 months ago by LauferVA 4.5k

0

Entering edit mode

for this one specific it is 147 (.584).

ADD REPLY • link 17 months ago by Victor ▴ 20