Entering edit mode
5.3 years ago
evelyn
▴
230
Hello,
I have merged three vcf files and translated the SNP calls using bcftools query and awk
$ bcftools query -f '%CHROM\t%POS\t%ID\t%REF\t%ALT[\t%TGT]\n' merge.vcf | awk -v FS="\t" -v OFS="\t" '{for(i=6;i<=NF;i++) {split($i, gt, "/"); if(gt[1]==".") $i="-"; else if(gt[1]==gt[2]) $i=gt[1]; else $i="N";} print }'
1 687 . G A - A -
1 689 . G A - A -
1 701 . T A - A -
1 704 . T G - G -
1 708 . C T,A T - A
1 6440 . G T T N -
Here the heterozygous calls are called as N
. I want to change the following heterozygous calls as IUPAC codes:
Code Base
R A or G
Y C or T
S G or C
W A or T
K G or T
M A or C
N any base
I tried using:
bcftools query -f '%CHROM\t%POS\t%ID\t%REF\t%ALT[\t%TGT]\n' merge.vcf | awk -v FS="\t" -v OFS="\t" '{for(i=6;i<=NF;i++) {split($i, gt, "/"); if(gt[1]==".") $i="NA"; else if(gt[1]==gt[2]) $i=gt[1]; else if(gt[2]=="AG") $i="R"; else if(gt[2]=="CT") $i="Y"; else if(gt[2]=="GC") $i="S"; else if(gt[2]=="AT") $i="W"; else if(gt[2]=="GT") $i="K"; else if(gt[2]=="AC") $i="M"; else $i="N";} print }' > out.vcf
But it does not give IUPAC codes for the above heterozygous calls. Any suggestions/help is appreciated.
Thank you so much for your time!
You are checking if the second value of the genotype after splitting by
/
isAG
. This is not what you want. You would like to check ifgt[1]
isA
andgt[2]
isG
or vice versa.