I have a list of high quality structural variant breakpoints that are the product of multiple different SV callers. While some of the approaches I use estimate the allele frequency for called breakpoints, others do not, so in some cases I have high quality variants that lack allele frequency estimates.
Because of this (and **), I would like to be able to calculate allele frequencies for a list of genomic coordinates.
Allele frequencies in this context refer to the fraction of reads supporting an alternate allele. A homozygous variant supported by all reads would have an allele frequency of 1, whereas a heterozygous would be 0.5 (assuming 100% purity of sample).
This should be possible by looking at the alignment file (.bam) and counting the number of reads supporting and disagreeing for each variant call. E.g. AR/RR+AR (alternate reads/referece reads + alternate reads).
I am currently thinking that I will have to write my own tool to do this (probably using PySam, but if anyone can suggest an existing approach, or an easy way of achieving this I would be happy to not re-invent this particular wheel!
** Also because I see a high discrepancy between the allele frequency values given by different approaches (DELLY, LUMPY+SVtyper, Meerkat, novobreak) for the same variant
Can't you do this using
samtools mpileup
?