I'm reading an interesting paper, Malachi et al., Cell Systems 2015, that talks about Variant Allele Frequency (VAF). Could someone please help me understand how they calculated this value? I couldn't find it in the methods.
edit
For example, let's say I have a bunch of VCF files that I would like to find VAF for. How would I go about doing that? Here's an example of a VCF file:
##fileformat=VCFv4.1
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AB,Number=1,Type=Float,Description="Allele Balance of Alt Allele">
##INFO=<ID=RD,Number=1,Type=Integer,Description="Depth of Ref allele">
##INFO=<ID=AD,Number=1,Type=Integer,Description="Depth of Alt allele">
##INFO=<ID=SAP,Number=1,Type=Float,Description="Strand Bias Probability of the Alt Allele">
##INFO=<ID=RAP,Number=1,Type=Float,Description="Strand Bias Probability of the Ref Allele">
##INFO=<ID=DP4,Number=4,Type=Float,Description="Fwd Strand Ref Counter, Rev Strand Ref Counter, Fwd Strand Alt Counter, Rev Strand Alt Counter">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT genotype
chr1 808922 . G A 249 PASS DP=49;AB=1.0000;RD=0;AD=49;SAP=0.5510;RAP=0;DP4=0,0,27,22 GT:GQ:DP 1/1:145:49
chr1 808928 . C T 249 PASS DP=51;AB=1.0000;RD=0;AD=51;SAP=0.5294;RAP=0;DP4=0,0,27,24 GT:GQ:DP 1/1:145:51
chr1 876499 . A G 217 PASS DP=49;AB=1.0000;RD=0;AD=49;SAP=0.6939;RAP=0;DP4=0,0,34,15 GT:GQ:DP 1/1:126:49
Thanks for the reply. Would you be able to provide an example pipeline for finding VAF from a VCF file? (please see edit)
The referenced scripts parses allele counts from an mpileup format file. From the VCF snippet you attached, in the header you can see:
The variant allele frequency (assuming that the germline is homozygous reference) is
VAF = AD / DP = Depth of Alt Allele / Total Depth
Thank you! Very helpful indeed.
But what do you mean by "assuming the germline is homozygous reference"?
By that I mean that in the germline (normal cell), the individual has two copies of the reference allele. By mentioning this, I am trying to exclude cases where the germline is heterozygous, and one the reference allele is lost by somatic copy number events (LOH).
baseParser.py seems not working these days. are you still maintaining it?
It works - just needs running in python2. and Thanks Noushin, very useful!