Hello Everyone,
I have found posts to use awk, grep for skipping column names and perform an awk operation for the rest of the file. But I could not find it useful for what I am trying to do. I have merged three vcf files and translated the genotype numbers to calls using:
bcftools query -f '%CHROM\t%POS\t%ID\t%REF\t%ALT[\t%TGT]\n' in.vcf | awk -v FS="\t" -v OFS="\t" '{for(i=6;i<=NF;i++) {split($i, gt, "/"); if(gt[1]==".") $i="-"; else if(gt[1]==gt[2]) $i=gt[1]; else $i="N";} print }' > out.vcf
Here, else $i="N"
replaces the column names with N
as well like this:
# [1]CHROM [2]POS [3]ID [4]REF [5]ALT N N N
However, I want to keep those column names (based on my file names) for further analysis like this:
# [1]CHROM [2]POS [3]ID [4]REF [5]ALT file1 file2 file3
I will appreciate your time and effort for any help. Thank you!
Can you please explain what are you trying to achieve?
I merged
vcf
files and then called the genotypes. Then changed the1/1 0/1 0/0 ./.
genotype calls to bases using awk. It changes heterozygous bases to(0/1 or 1/0)
to'N'
using else$i="N"
. But I want to keep the column names which are also changed to'N'
. Thank you!