Merging VCFs from same sample created by different variant callers and adding new INFO field for variant caller
0
0
Entering edit mode
2.6 years ago
Jordi ▴ 60

Hi,

I am generating 2 distinct SNVs/small indels for each sample from WGS data. The two VCFs are generated by gatk HaplotypeCaller and deepvariant. After filtering out low quality calls for either VCF, I would now like to merge them into a single VCF to be annotated.

This VCF, however, should keep the information about the variant caller that called each variant.

For example, the two caller-specific VCFs might look like this:

HaplotypeCaller VCF

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT          sampleID
chr1    10583   .   G   A   405.64  PASS    AC=1    GT:AD:DP:GQ:PL  0/1:12,16:28:99:413,0,320
chr1    10622   .   T   G   205.67  PASS    AC=1    GT:AD:DP:GQ:PL  0/1:1,8:9:12:213,0,12
chr1    10623   .   T   C   189.97  PASS    AC=2    GT:AD:DP:GQ:PL  1/1:1,6:7:18:204,18,0

DeepVariant VCF

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT                  sampleID
chr1    10583   .   G   A   48.8    PASS    .   GT:GQ:DP:AD:VAF:PL  0/1:50:28:12,16:0.43:413,0,320
chr1    14907   .   A   G   24.4    PASS    .   GT:GQ:DP:AD:VAF:PL  0/1:23:68:22,46:0.676471:24,0,27

The merged VCF should have a new INFO filed VC for "Variant Caller" and show HC if found in the HaplotypeCaller VCF and DV when found in the DeepVariant VCF. When found in both, it should show HC/DV:

#CHROM  POS ID  REF ALT QUAL            FILTER  INFO            FORMAT                 sampleID
chr1    10583   .   G   A   405.64/48.8 PASS    VC=HC/DV;AC=1   GT:GQ:DP:AD:VAF:PL  0/1:99:28:12,16:0.43:413,0,320
chr1    10622   .   T   G   205.67  PASS    VC=HC;AC=1  GT:GQ:DP:AD:VAF:PL  0/1:12:9:1,8:0.11:213,0,12
chr1    10623   .   T   C   189.97  PASS    VC=HC;AC=2  GT:GQ:DP:AD:VAF:PL  1/1:18:7:1,6:0.14:204,18,0
chr1    14907   .   A   G   24.4    PASS    VC=DV   GT:GQ:DP:AD:VAF:PL  0/1:23:68:22,46:0.68:24,0,27

Do you think it makes to merge the VCFs in this manner or would you say that it is better to treat them separately and only merge after annotation using a custom python script?

Thanks for your help and thoughts!

deepvariant vcf bcftools haplotypecaller • 546 views
ADD COMMENT

Login before adding your answer.

Traffic: 1842 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6