Hi there,
I am working with some VCF files and I find it difficult to understand some of the tags in the INFO column of VCF files.
1) Does NS (number of samples with data) refer to the number of samples with genotype "0/1" or "1/1"?
2) Does AC (allele count in genotypes) refer to the number of "1" in the genotypes "0/1" and "1/1"?
3) Does AN (total number of alleles in called genotypes) refer to the number of "0" and "1" in all three genotypes namely "0/0", "0/1" and "1/1"?
4) Should be AF (allele frequency) calculated as AF = AC/AN where AC is allele count in genotypes and AN is total number of alleles in called genotypes? Or should another formula used AF=AC/(number of samples 2)? Usually AN is not equal to (number of samples 2) due to missing alleles in the GT field.
5) I see some INFO tags in VCF files generated by bcftools (Plugin fill-tags) where AF=AC/(number of samples * 2) rather than AF = AC/AN is used? why is that? In addition, NS and AN provided by bcftools (Plugin fill-tags) seem not correct compared to when I manually count them according to the above-mentioned definitions? Does anyone come across this before?
Many thanks,
Kind regards,
Tom
Dear Tom, Did you resolve this? I need this exact information. Please share what you finally found out