Hello,
I have merged three vcf files and translated the SNP calls using bcftools query and awk. I want to change the heterozygous calls as IUPAC codes:
Code Base
R A or G
Y C or T
S G or C
W A or T
K G or T
M A or C
N any base
I tried using:
bcftools query -f '%CHROM\t%POS\t%ID\t%REF\t%ALT[\t%TGT]\n' merge.vcf | awk -v FS="\t" -v OFS="\t" '{for(i=6;i<=NF;i++) {split($i, gt, "/"); if(gt[1]==".") $i="NA"; else if(gt[1]==gt[2]) $i=gt[1]; else if(gt[2]=="A" || gt[2]=="G") $i="R"; else if(gt[2]=="C" || gt[2]=="T") $i="Y"; else if(gt[2]=="G" || gt[2]=="C") $i="S"; else if(gt[2]=="A" || gt[2]=="T") $i="W"; else if(gt[2]=="G" || gt[2]=="T") $i="K"; else if(gt[2]=="A" || gt[2]=="C") $i="M"; else $i="N";} print }' > out.vcf
But it gives wrong IUPAC codes:
CHROM POS ID REF ALT File1 File2 File3
1 441 . G T T Y NA
1 447 . G A A R NA
1 452 . G A A A NA
1 468 . G T T Y NA
1 473 . T C C C NA
1 474 . G T T Y NA
1 489 . T A A A NA
1 555 . A G G G NA
1 577 . T G G G NA
1 578 . A T T T NA
1 909 . C A A NA NA
But it gives wrong IUPAC codes for example, for G or T (pos 468 for File 2), it should give a K but gives Y. I think I am missing something in the command here. Thank you for your help!
Thank you, @fin swimmer! It worked perfectly. I accidentally deleted my post and I did not know how to recover it :/ Thank you!
The above command converted
A\G
orG\A
toR
as:But it converts all the heterozygous calls to
R
only. There is no error and the file is produced but only toR
. I am not sure what is missing.