Here is the awk script i wrote to extract INFO TAGs for each line from VCF (like) files:
USAGE: awk -F"\t" -v InfoColumns="8,16,24" -v TAGS="txGN,DP,DP4,CLR" -f extract.vcf.info.Tag.awk union.7samples.tsv
#!/bin/awk -f
##extract.vcf.info.Tag.awk
##INPUTS (pasted or) TSV or VCFs file with INFO field intact from VCF
## USAGE: awk -v InfoColumns="8,16,24" -v TAGS="txGN,DP,DP4,CLR" -f extract.vcf.info.Tag.awk union.7samples.tsv
# BEGIN { FS = "\t" } # if not using -F"\t" above
# excludes lines starting with # or ##
(substr($1,1,1)!="#" && substr($1,2,1)!="#") {
printf $0 ; ## Prints original Line
split(TAGS,key,",") ;
split(InfoColumns,col,",") ;
n = asorti(col,copy); # To preserve the original column order
for(i=1;i<=n;i++){
split($col[copy[i]],info,";");
k = asorti(key,kapy) ; # To preserve the original key order
for(j=1;j<=k;j++){
pat1=key[kapy[j]]"=";
if ($col[copy[i]] ~ pat1){
for (f in info){
if (info[f] ~ pat1){
sub(pat1,"",info[f]);
sub(/"/,"",info[f]);
printf "\t" info[f]; # Prints extracted info tag field
}
}
}
else
printf "\t" "."; # Prints "dot" if not present
}
}
printf "\n";
}
Use vcftools --get-INFO option. So your script would be:
./vcftools --vcf your_vcf_file.vcf --get-INFO txGn --out vcf_file_gene_name_info