Question

Calculation of VAF (variant allele frequency)

7

Entering edit mode

6.2 years ago

JJ ▴ 760

Hi all,

So, I am interested in computing the VAF (variant allele frequency) by extracting values from a VCF file. From my understanding, the VAF is calculated as follows: AD(second entry)/DP so e.g.

AD = 8,4 
DP = 12
VAF = 4/12

AD = 5,10
DP = 15
VAF = 10/15

Is this correct?

However, I've seen that sometimes the frequency of the most frequent (or less frequent) allele is computed. Hence it would be 8/12 and 10/15 (or 4/12 and 5/15 for the less frequent allele).

Or is this the difference between the AF (allele frequency), MAF (minor allele frequency) and VAF (variant allele frequency)? (also see MAF vs VAF on this topic)

And what about the 1/2 SNPs? do I have two values for VAF then?

AD = 0,5,10
DP = 15
VAF1 = 5/15
VAF2 = 10/15

I am slightly confused here. Thanks for your input!

sequencing • 25k views

ADD COMMENT • link updated 2.4 years ago by johnston.mike.j ▴ 60 • written 6.2 years ago by JJ ▴ 760

score 12 · Answer 1 · 2019-02-02

MAF = minor allele frequency, is the frequency of these allele in the POPULATION.

you have a bunch of samples genotyped and then:

1- you calculate the frequency of one of the alleles (usually the non-reference allele) for a given variant:

 freq(a) =( sum(samples_with_geno_aa x 2) + sum(samples_with_geno_Aa)) / (samples x 2)

2- freq(A) = 1-freq(a)

3- now comes the practical and problematic issue of Minor. Not necessarily, the nonreference allele is the less frequent. Or, if it is in your population could not be the allele with less frequency in other population. Or if freq(a) is 0.49, it could be that next bunch of samples for this SNP the freq(a)=0.51 and then the MAF allele is A instead of a.

So, always that you calculate a MAF you need to explicitely to tell the MAF_allele for this genotype. Don't expect to be the nonreference one.

VAF The concept of VAF, variant allele FRACTION (I prefer to use "fraction" to "frequency" as I come from population genetics background and I use "frequency" for population, no reads sampling.)

VAF is used mainly in two scenarios

germline genotyping: in diploid organism the VAF helps to find if all went well with the genotype calling. the VAF of a locus with a deph>80 should be near 50%. With depth [30-40] the VAF of a het could vary between 0.35-0.65

a VAF around 0.25 could mean that there is another copy of this region in the genome, one is "AA" and another "Aa" (a 25% of "a")
cancer genotyping In cancer, in a sample, you have a mix of samples each with its own mutations. Here we use the VAF as a proxy of how many cells do we have for each cancer cell lineage (this is like the MAF for the cancer cells in the sequenced material).