Hi,
Generally, in cancer variation studies, the variant allele fraction (VAF) is calculated using this formula: alt reads/total reads at the loci.
In a VCF file, the FORMAT/AD tag has two values, for ex., 43,45 where the numbers represent allelic depths for the ref and alt alleles for a sample in the order listed.
The FORMAT field also has the DP tag which is the total depth. The difference between AD and DP in short is that a ref or alt supporting reads gets counted towards AD only if it is informative. Whereas, in DP both informative and uninformative reads gets counted. More inf can be found here: https://gatk.broadinstitute.org/hc/en-us/articles/360035532252-Allele-Depth-AD-is-lower-than-expected
My question iif it is possible that the total of alt reads and ref reads from the AD tag may not match the DP tag, what should be the best way to calculate VAF. Can I only focus on the AD tag and fo this calculation: alt reads/ref reads+alt reads, OR, is it okay to do this calculation- alt reads/DP
Regards. Prasun
Since the AD value reflects how many reads actually contributed support for a given allele at the site, I would only focus on the AD tag. However, it can be complicated because both DP and GT may differ from the VAF. Another problem is the visualization in the IGV that may differ from the calculated VAF using the AD values. It would probably be a good idea to talk to people who would use the data about this possible difference.
Thanks! I am actually going to do that now.
There was a similar question previously: VCF AF and %Freq
Hi,
Thanks for this. But, I don't think it answers my question. INFO/AF just gives information of the number of alternate alleles at site. For example, it can be 0.5 for a loci having a heterozygous alternate genotype (0/1) in a single sample VCF. VAF calculation on the other hand takes the number of reads supporting an alternate and reference allele into consideration.
VAF and AF can both refer to allele frequency. The resources in the other thread are specifically related to your original question.
Thanks Igor. I understand. Unfortunetly, the first two links are not opening (where I believe my answer lies). Let me check in the gatk website itself. They may have changed the link. Thanks a lot again!
Hi Prasun. Did you find a way to determine VAF from VCF files? I have to find out Allelic Fractions from my WES data (BAM/VCF) and I don't know how to do it. :(
Hi,
Yes..please see this: https://github.com/samtools/bcftools/issues/1422
If its a multisample VCF file, you need to use the correct tags to calculate it on a per sample basis. You can get an idea from here: https://github.com/samtools/bcftools/issues/1731
For both cases, try getting the latest version because the second one won't work unless you have at least bcftools v1.15.
Thank you for getting back!
Hi Prasun. So I called my somatic variants using GATK Mutect2. I have 4 individual sample VCF files which I generated using the Tumor-matched normal mode (Tumor+Blood DNA for 4 samples). There are a number of terminal command lines in the link you shared. Which one should I use? Also, would it work for VCF files generated from Mutect2? Thanks for the help.
I am sorry, I am not very conversant with Mutect2 output, but it should be a regular VCF file. But if its a single sample VCF file, you can use this command: