Question

Commands to do the following in BCFtools or recommendations for another tool

0

Entering edit mode

20 months ago

ex_nihilo • 0

I am looking for a VCF manipulation tool (or tools) that will allow me to do the following:

Add arbitrary INFO/FORMAT tags to a VCF with user-defined values based on an expression that may reference existing INFO/FORMAT tags.

Example: Create a new tag, INFO/EXAMPLE with a value of 'myexample' for all variants that meet the criteria INFO/DP < 10 OR QUAL < 20

Do database joins on INFO/FORMAT tags in a VCF to create new tags.

Example: Left join my VCF (on tag INFO/EXAMPLE) to a tab-delimited file (on field EXAMPLE), and add a new tag INFO/EXAMPLE2 to the left file using the value in the EXAMPLE2 field from the matching row of the right file

I think BCFtools annotate can nearly do what I want for the first use case, but I haven't found a clean way to do this that doesn't involve some messy additional manipulation outside bcftools. The second use case might just be easier to handle in R/python outside of VCF format entirely, and that is an option if there's nothing out-of-the-box that will do it for me.

Thanks.

bcftools vcf • 1.1k views

ADD COMMENT • link updated 20 months ago by barslmn ★ 2.4k • written 20 months ago by ex_nihilo • 0

0

Entering edit mode

Yes, the first one is a good use case for bcftools annotate with the -i flag picking the appropriate variants to add the INFO/EXAMPLE tag to. You will also need to create a pass the header line in a file.

ADD REPLY • link 20 months ago by Ram 45k

score 1 · Answer 1 · 2023-10-04

These both can be solved with bcftools. First one is quite similar to examples on how to site: http://samtools.github.io/bcftools/howtos/annotate.html For second one you can left join with the join command. I don't think it is messier than any python or R solution.

Example: Create a new tag, INFO/EXAMPLE with a value of 'myexample' for all variants that meet the criteria INFO/DP < 10 OR QUAL < 20

$ bcftools view Test.vcf
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=chr1,length=248956422>
##bcftools_viewVersion=1.18-6-gc7cbe0b3+htslib-1.18-9-gd89f1fe-dirty
##bcftools_viewCommand=view Test.vcf; Date=Wed Oct  4 17:04:41 2023
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  name_placeholder
chr1    1       .       T       G       1       .       .       GT      1/1
chr1    2       .       T       G       2       .       .       GT      1/1
chr1    3       .       T       G       3       .       .       GT      1/1
chr1    4       .       T       G       4       .       .       GT      1/1

$ bcftools view -i "QUAL>=3" Test.vcf | bcftools query -f "%CHROM\t%POS\tmyexample\n" | bgzip -c > annot.txt.gz

$ tabix -s1 -b2 -e2 annot.txt.gz

$ echo -e '##INFO=<ID=MYEXAMPLE2,Number=1,Type=String,Description="biostar">' >> hdr.txt

$ bcftools annotate -a annot.txt.gz -h hdr.txt -c CHROM,POS,INFO/MYEXAMPLE Test.vcf
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=chr1,length=248956422>
##INFO=<ID=MYEXAMPLE,Number=1,Type=String,Description="biostar">
##bcftools_annotateVersion=1.18-6-gc7cbe0b3+htslib-1.18-9-gd89f1fe-dirty
##bcftools_annotateCommand=annotate -a annot.txt.gz -h hdr.txt -c CHROM,POS,INFO/MYEXAMPLE Test.vcf; Date=Wed Oct  4 17:04:59 2023
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  name_placeholder
chr1    1       .       T       G       1       .       .       GT      1/1
chr1    2       .       T       G       2       .       .       GT      1/1
chr1    3       .       T       G       3       .       MYEXAMPLE=myexample      GT      1/1
chr1    4       .       T       G       4       .       MYEXAMPLE=myexample      GT      1/1

Example: Left join my VCF (on tag INFO/EXAMPLE) to a tab-delimited file (on field EXAMPLE), and add a new tag INFO/EXAMPLE2 to the left file using the value in the EXAMPLE2 field from the matching row of the right file

$ cat test.tsv
match   hi

$ bcftools view Test2.vcf
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=chr1,length=248956422>
##INFO=<ID=MYEXAMPLE,Number=1,Type=String,Description="biostar">
##bcftools_viewVersion=1.18-6-gc7cbe0b3+htslib-1.18-9-gd89f1fe-dirty
##bcftools_viewCommand=view Test2.vcf; Date=Wed Oct  4 17:20:00 2023
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  name_placeholder
chr1    1       .       T       G       1       .       .       GT      1/1
chr1    2       .       T       G       2       .       .       GT      1/1
chr1    3       .       T       G       3       .       MYEXAMPLE=match GT       1/1
chr1    4       .       T       G       4       .       MYEXAMPLE=nomatch GT      1/1

$ bcftools query -f "%MYEXAMPLE\t%CHROM\t%POS\n" Test2.vcf > exampletags

$ join -t$'\t' exampletags test.tsv | cut -f 2-4 | bgzip -c > annot.txt.gz

$ tabix -s1 -b2 -e2 annot.txt.gz

$ echo -e '##INFO=<ID=MYEXAMPLE2,Number=1,Type=String,Description="biostar">' >> hdr.txt

$ bcftools annotate -a annot.txt.gz -h hdr.txt -c CHROM,POS,INFO/MYEXAMPLE2 Test2.vcf
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=chr1,length=248956422>
##INFO=<ID=MYEXAMPLE,Number=1,Type=String,Description="biostar">
##INFO=<ID=MYEXAMPLE2,Number=1,Type=String,Description="biostar">
##bcftools_annotateVersion=1.18-6-gc7cbe0b3+htslib-1.18-9-gd89f1fe-dirty
##bcftools_annotateCommand=annotate -a annot.txt.gz -h hdr.txt -c CHROM,POS,INFO/MYEXAMPLE2 Test2.vcf; Date=Wed Oct  4 17:20:32 2023
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  name_placeholder
chr1    1       .       T       G       1       .       .       GT      1/1
chr1    2       .       T       G       2       .       .       GT      1/1
chr1    3       .       T       G       3       .       MYEXAMPLE=match;MYEXAMPLE2=hi    GT      1/1
chr1    4       .       T       G       4       .       MYEXAMPLE=nomatch    GT      1/1