OK, I have searched for this everywhere, and I just can't seem to even figure out if it is possible/meaningful to annotate (add tags and associated data) VCFs with external 'genotype' (##FORMAT=<ID=VALUE,Number=VALUE,Type=VALUE,Description="VALUE">
) fields.
I know that bcftools annotate
and other tools can add INFO
tags and can EXCLUDE FORMAT
tags. However the information that I need to add does not make sense except when associated with a specific sample, while INFO
tags apply to the variant without regard to the sample in which it occurs.
For example,I am comparing multiple family-triplets (mother/father/affected-child). I would like to add a tag in the FORMAT
field that represents which triplet the individual belongs to. In addition, I would like to add information to each sample that indicates which 'mode of inheritance' the SNP appears to follow in each triplet.
This is information that is inherently tied to the sample and therefore ill-suited to the INFO
-type tag; however, I can not for the life of me find a tool that even mentions this. Am I missing some super-obvious reason that people don't ever need/want to be able to annotate VCFs in this fashion? Or is my google-fu simply too weak?
For your reference I will share what I have attempted using bcftools annotate
(all zipping and indexing of related files has been ommited here for brevity):
annots.tab.gz
CHROM POS AGE_MO BAM_OK FAM_ID MOI
1 12921499 30 0 youdontknowme CmpHet
1 12921600 30 0 youdontknowme CmpHet
1 12939476 30 0 youdontknowme CmpHet
1 12939562 30 0 youdontknowme CmpHet
1 12939747 30 0 youdontknowme CmpHet
1 12942047 30 1 youdontknowme CmpHet
1 12942138 30 1 youdontknowme CmpHet
1 12942179 30 1 youdontknowme CmpHet
...
annots.hdr
##FORMAT=<ID=AGE_MO,Number=1,Type=Float,Description="Age of associated proband in months.">
##FORMAT=<ID=FAM_ID,Number=1,Type=String,Description="Identification of family to which the individual belongs.">
##FORMAT=<ID=BAM_OK,Number=0,Type=Flag,Description="Manual inspection of the BAM file corroborates the MOI.">
##FORMAT=<ID=MOI,Number=1,Type=String,Description="Mode of Inheritance: HZR=recessive, DeNovo=de novo, XL=X-linked, CmpHet=Compound Het">
bcftools command
bcftools annotate -a annots.tab.gz -h annots.hdr -c CHROM,POS,AGE_MO,BAM_OK,FAM_ID,MOI data.vcf.bgz -Ou -o annotated.data.bcf
Resulting error
The tag "AGE_MO" is not defined in annots.tab.gz
is there an example of how to do this? I am not seeing any examples listed for adding FORMAT tags