Here is the awk script i wrote to extract INFO TAGs for each line from VCF (like) files:
USAGE: awk -F"\t" -v InfoColumns="8,16,24" -v TAGS="txGN,DP,DP4,CLR" -f extract.vcf.info.Tag.awk union.7samples.tsv
(substr($1,1,1)!="#" && substr($1,2,1)!="#") {
printf $0 ;
split(TAGS,key,",") ;
split(InfoColumns,col,",") ;
n = asorti(col,copy);
for(i=1;i<=n;i++){
split($col[copy[i]],info,";");
k = asorti(key,kapy) ;
for(j=1;j<=k;j++){
pat1=key[kapy[j]]"=";
if ($col[copy[i]] ~ pat1){
for (f in info){
if (info[f] ~ pat1){
sub(pat1,"",info[f]);
sub(/"/,"",info[f]);
printf "\t" info[f]; # Prints extracted info tag field
}
}
}
else
printf "\t" "."; # Prints "dot" if not present
}
}
printf "\n";
}
Use vcftools --get-INFO option. So your script would be:
./vcftools --vcf your_vcf_file.vcf --get-INFO txGn --out vcf_file_gene_name_info