You need to switch from VCFtools to BCFTools, in partcular, bcftools query
.
It looks like you not only want certain columns but also certain key-value pairs within the primary VCF columns, which are tab-delimited.
Here are examples that will assist you from one of my own VCFs:
bcftools query -f'[%CHROM:%POS %GT\n]' 2701.snvindel.var.vcf.gz | head -5
1:69511 1/1
1:69761 0/1
1:752721 0/1
1:752894 1/1
1:762273 0/1
.
bcftools query -f'[%CHROM:%POS:%REF:%ALT %SAMPLE %GT\n]' 2701.snvindel.var.vcf.gz | head -5
1:69511:A:G 2701 1/1
1:69761:A:T 2701 0/1
1:752721:A:G 2701 0/1
1:752894:T:C 2701 1/1
1:762273:G:A 2701 0/1
Should be fairly obvious what those are doing. To extract certain values from the INFO column, which is what you appear to have to do, you can do the following:
bcftools query -f'[%CHROM:%POS:%REF:%ALT %INFO/HaplotypeScore:%INFO/VQSLOD %SAMPLE %GT\n]' 2701.snvindel.var.vcf.gz | head -5
1:69511:A:G 0.9159:-6.231 2701 1/1
1:69761:A:T 0:-9.034 2701 0/1
1:752721:A:G 0:-1.447 2701 0/1
1:752894:T:C 0:-6.798 2701 1/1
1:762273:G:A 5.3647:-2.236 2701 0/1
Here, HaplotypeScore
and VQSLOD
are tags define din my INFO field.
Kevin
if you want GUI based program this is the one to use
Please post input vcf (with headers and few example records) and the columns you want to extract @OP
Hey guys, I ended up using some perl scripting to fix my issue. I realized that everything was being printed in the 9th column i.e. Exac|gnomad|..|..| so I ended up spliting that column and then pasting / joining the ones I needed. :) Thank you all for the help!
You're welcome dude