Commands to do the following in BCFtools or recommendations for another tool
1
0
Entering edit mode
14 months ago
ex_nihilo • 0

I am looking for a VCF manipulation tool (or tools) that will allow me to do the following:

  • Add arbitrary INFO/FORMAT tags to a VCF with user-defined values based on an expression that may reference existing INFO/FORMAT tags.

Example: Create a new tag, INFO/EXAMPLE with a value of 'myexample' for all variants that meet the criteria INFO/DP < 10 OR QUAL < 20

  • Do database joins on INFO/FORMAT tags in a VCF to create new tags.

Example: Left join my VCF (on tag INFO/EXAMPLE) to a tab-delimited file (on field EXAMPLE), and add a new tag INFO/EXAMPLE2 to the left file using the value in the EXAMPLE2 field from the matching row of the right file

I think BCFtools annotate can nearly do what I want for the first use case, but I haven't found a clean way to do this that doesn't involve some messy additional manipulation outside bcftools. The second use case might just be easier to handle in R/python outside of VCF format entirely, and that is an option if there's nothing out-of-the-box that will do it for me.

Thanks.

bcftools vcf • 709 views
ADD COMMENT
0
Entering edit mode

Yes, the first one is a good use case for bcftools annotate with the -i flag picking the appropriate variants to add the INFO/EXAMPLE tag to. You will also need to create a pass the header line in a file.

ADD REPLY
1
Entering edit mode
14 months ago
barslmn ★ 2.3k

These both can be solved with bcftools. First one is quite similar to examples on how to site: http://samtools.github.io/bcftools/howtos/annotate.html For second one you can left join with the join command. I don't think it is messier than any python or R solution.

Example: Create a new tag, INFO/EXAMPLE with a value of 'myexample' for all variants that meet the criteria INFO/DP < 10 OR QUAL < 20

$ bcftools view Test.vcf
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=chr1,length=248956422>
##bcftools_viewVersion=1.18-6-gc7cbe0b3+htslib-1.18-9-gd89f1fe-dirty
##bcftools_viewCommand=view Test.vcf; Date=Wed Oct  4 17:04:41 2023
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  name_placeholder
chr1    1       .       T       G       1       .       .       GT      1/1
chr1    2       .       T       G       2       .       .       GT      1/1
chr1    3       .       T       G       3       .       .       GT      1/1
chr1    4       .       T       G       4       .       .       GT      1/1

$ bcftools view -i "QUAL>=3" Test.vcf | bcftools query -f "%CHROM\t%POS\tmyexample\n" | bgzip -c > annot.txt.gz

$ tabix -s1 -b2 -e2 annot.txt.gz

$ echo -e '##INFO=<ID=MYEXAMPLE2,Number=1,Type=String,Description="biostar">' >> hdr.txt

$ bcftools annotate -a annot.txt.gz -h hdr.txt -c CHROM,POS,INFO/MYEXAMPLE Test.vcf
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=chr1,length=248956422>
##INFO=<ID=MYEXAMPLE,Number=1,Type=String,Description="biostar">
##bcftools_annotateVersion=1.18-6-gc7cbe0b3+htslib-1.18-9-gd89f1fe-dirty
##bcftools_annotateCommand=annotate -a annot.txt.gz -h hdr.txt -c CHROM,POS,INFO/MYEXAMPLE Test.vcf; Date=Wed Oct  4 17:04:59 2023
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  name_placeholder
chr1    1       .       T       G       1       .       .       GT      1/1
chr1    2       .       T       G       2       .       .       GT      1/1
chr1    3       .       T       G       3       .       MYEXAMPLE=myexample      GT      1/1
chr1    4       .       T       G       4       .       MYEXAMPLE=myexample      GT      1/1

Example: Left join my VCF (on tag INFO/EXAMPLE) to a tab-delimited file (on field EXAMPLE), and add a new tag INFO/EXAMPLE2 to the left file using the value in the EXAMPLE2 field from the matching row of the right file

$ cat test.tsv
match   hi

$ bcftools view Test2.vcf
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=chr1,length=248956422>
##INFO=<ID=MYEXAMPLE,Number=1,Type=String,Description="biostar">
##bcftools_viewVersion=1.18-6-gc7cbe0b3+htslib-1.18-9-gd89f1fe-dirty
##bcftools_viewCommand=view Test2.vcf; Date=Wed Oct  4 17:20:00 2023
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  name_placeholder
chr1    1       .       T       G       1       .       .       GT      1/1
chr1    2       .       T       G       2       .       .       GT      1/1
chr1    3       .       T       G       3       .       MYEXAMPLE=match GT       1/1
chr1    4       .       T       G       4       .       MYEXAMPLE=nomatch GT      1/1

$ bcftools query -f "%MYEXAMPLE\t%CHROM\t%POS\n" Test2.vcf > exampletags

$ join -t$'\t' exampletags test.tsv | cut -f 2-4 | bgzip -c > annot.txt.gz

$ tabix -s1 -b2 -e2 annot.txt.gz

$ echo -e '##INFO=<ID=MYEXAMPLE2,Number=1,Type=String,Description="biostar">' >> hdr.txt

$ bcftools annotate -a annot.txt.gz -h hdr.txt -c CHROM,POS,INFO/MYEXAMPLE2 Test2.vcf
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=chr1,length=248956422>
##INFO=<ID=MYEXAMPLE,Number=1,Type=String,Description="biostar">
##INFO=<ID=MYEXAMPLE2,Number=1,Type=String,Description="biostar">
##bcftools_annotateVersion=1.18-6-gc7cbe0b3+htslib-1.18-9-gd89f1fe-dirty
##bcftools_annotateCommand=annotate -a annot.txt.gz -h hdr.txt -c CHROM,POS,INFO/MYEXAMPLE2 Test2.vcf; Date=Wed Oct  4 17:20:32 2023
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  name_placeholder
chr1    1       .       T       G       1       .       .       GT      1/1
chr1    2       .       T       G       2       .       .       GT      1/1
chr1    3       .       T       G       3       .       MYEXAMPLE=match;MYEXAMPLE2=hi    GT      1/1
chr1    4       .       T       G       4       .       MYEXAMPLE=nomatch    GT      1/1
ADD COMMENT

Login before adding your answer.

Traffic: 2111 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6