I am working on joint genotype call of gvcf files of 200 samples. This is my first time with vcf data. I get the format. but I am struggling with some basic questions like
What is a reference allele e.g. in hg19 file from 1000 genomes_phase3? Is it the reference allele seen on most of people sequence?
What is an alternate allele? Is it the one which is minor allele at that variant and position?
What is NON_REF allele in alt. allele column?
Thanks.. This is awesome!
Happy to help. If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Cheers, Wouter
Refer to G5, G5A, KG and KG-PROD tags in dbSNP (refer to dbSNP builds for hg19 equivalent NCBI genome) to know reference allele frequency. dbSNP includes allele frequency from 1000 genome and hapmap projects.