I am looking for a VCF manipulation tool (or tools) that will allow me to do the following:
- Add arbitrary INFO/FORMAT tags to a VCF with user-defined values based on an expression that may reference existing INFO/FORMAT tags.
Example: Create a new tag, INFO/EXAMPLE with a value of 'myexample' for all variants that meet the criteria INFO/DP < 10 OR QUAL < 20
- Do database joins on INFO/FORMAT tags in a VCF to create new tags.
Example: Left join my VCF (on tag INFO/EXAMPLE) to a tab-delimited file (on field EXAMPLE), and add a new tag INFO/EXAMPLE2 to the left file using the value in the EXAMPLE2 field from the matching row of the right file
I think BCFtools annotate can nearly do what I want for the first use case, but I haven't found a clean way to do this that doesn't involve some messy additional manipulation outside bcftools. The second use case might just be easier to handle in R/python outside of VCF format entirely, and that is an option if there's nothing out-of-the-box that will do it for me.
Thanks.
Yes, the first one is a good use case for bcftools annotate with the
-i
flag picking the appropriate variants to add the INFO/EXAMPLE tag to. You will also need to create a pass the header line in a file.