MAF = minor allele frequency, is the frequency of these allele in the POPULATION.
you have a bunch of samples genotyped and then:
1- you calculate the frequency of one of the alleles (usually the non-reference allele) for a given variant:
freq(a) =( sum(samples_with_geno_aa x 2) + sum(samples_with_geno_Aa)) / (samples x 2)
2- freq(A) = 1-freq(a)
3- now comes the practical and problematic issue of Minor. Not necessarily, the nonreference allele is the less frequent. Or, if it is in your population could not be the allele with less frequency in other population. Or if freq(a) is 0.49, it could be that next bunch of samples for this SNP the freq(a)=0.51 and then the MAF allele is A instead of a.
So, always that you calculate a MAF you need to explicitely to tell the MAF_allele for this genotype. Don't expect to be the nonreference one.
VAF The concept of VAF, variant allele FRACTION (I prefer to use "fraction" to "frequency" as I come from population genetics background and I use "frequency" for population, no reads sampling.)
VAF is used mainly in two scenarios
germline genotyping: in diploid organism the VAF helps to find if all went well with the genotype calling. the VAF of a locus with a deph>80 should be near 50%. With depth [30-40] the VAF of a het could vary between 0.35-0.65
a VAF around 0.25 could mean that there is another copy of this region in the genome, one is "AA" and another "Aa" (a 25% of "a")
cancer genotyping In cancer, in a sample, you have a mix of samples each with its own mutations. Here we use the VAF as a proxy of how many cells do we have for each cancer cell lineage (this is like the MAF for the cancer cells in the sequenced material).
Thank you so much for your detailed answer!
Now it's clear to me what the difference between MAF and VAF is and I am looking into the variant allele fraction and not MAF. I have a couple of cancer samples (actually cancer cell lines) and I want to look at the heterogeneity within each sample. I expect a very high heterogeneity of course. Even more so, as the ploidy can vary from cell to cell.
I have three questions remaining: 1) relating to germline genotyping - I wonder what could have all gone wrong? and how can I check that for cancer samples where high heterogeneity is expected? Moreover, I have a lower depth [10-30] - what range for het could be expected here?
2) since it's called the VARIANT allele fraction, I assume it's always the AD(second entry)/DP as I stated above. Correct?
3) So what about the 1/2 SNPs? do I have two values for VAF then?
Again, thanks so much!!
Hi there, do you have any recommended material for understanding variant allele frequency and its calculation? Thank you!
With regards to adding VAF values, consider using:
See bcftools plugin docs here:
https://samtools.github.io/bcftools/howtos/plugin.fill-tags.html