I would like to annotate records in one VCF file (input.vcf
) with some of the INFO fields of the corresponding records from the database (db.vcf
), but only if the recorded mutation matches exactly in input and in the database. E. g. let's say I have three very simple VCF files:
input.vcf
##fileformat=VCFv4.1
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 878638 . G A 100 PASS A=3.0
db1.vcf
##fileformat=VCFv4.1
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 878638 . G A 100 PASS B=4.0
db2.vcf
##fileformat=VCFv4.1
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 878638 . G T 100 PASS B=4.0
Note that db1 and db2 describe different SNPs at the same locus; SNP in db1.vcf
matches with the one in input.vcf
, but SNP in db2.vcf
does not. I need a tool that can discern such cases and annotate the input file record with information from database only if the mutations match. Is there a tool to accomplish what I want?
I tried using GATK's VariantAnnotator and vcflib's vcfaddinfo; they unfortunately both ignore information about the mutation and add B=4.0 annotation in both cases.
Just to clarify, this is what I want in the case described:
$ some_tool input.vcf db1.vcf # SNP in input and database matches
##fileformat=VCFv4.1
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 878638 . G A 100 PASS A=3.0;B=4.0
$ some_tool input.vcf db2.vcf # SNP in input and database do not match
##fileformat=VCFv4.1
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 878638 . G A 100 PASS A=3.0
Just tested, and it does precisely what I want. Thank you so much!
Also I see that you are a maintainer and developer on bcftools, so double thank you for both the tool and your answer :-)
Thank you so much Shane McCarthy .. I was stuck with this for a while and this does what I was exactly looking for!!