Hi
From my vcf file, I need to generate a text file with the following headers (shown just an example):
CHROM POS ID REF ALT AA
1 886817 rs11174805 C G C
1 886817 rs111111111 C T .
1 886817 rs11144444 A T A
Here AA column is the Ancestral allele column that can be obtained from the INFO column of the vcf file which has "AA=allele_name". In second row, "." implies missing "AA=." in INFO column and thus the ancestral allele is unknown. i have extracted the columns from my vcf file using:
awk 'BEGIN {OFS ="\t" ; FS = "\t"};{print $1, $2, $3, $4, $5, $8}' chr22.vcf
This gives me columns - CHROM, POS, ID, REF, ALT, INFO.
INFO column looks like this: GAC=1;AF=0.000199681;AN=5008;NS=2504;DP=8012;EAS_AF=0;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0.001;AA=.|||;VT=SNP
Now, using shell script, how can I extract AA from INFO and create a new column with AA alleles in it ?
Thanks
Thanks for your help ! Is there also a way to keep only bi-allelic SNPs in this output using bcf tools ?