Question

Comparison of exome variant allele frequencies

0

Entering edit mode

7.7 years ago

calven01 • 0

Hey everyone,

First time posting and trying to get my head around bioinformatics, so apologies if there is a simple answer to this.

I currently have whole exome sequencing data from human tumours and from PDX mice. For both samples I have calculated the allele frequency for each variant present in each sample. Is there a method to compare the allele frequencies for samples to assess whether their difference is statistically significant. I was thinking of using Mann-Whitney U test to compare the mean allele frequency but I thought this may be an oversimplification.

Thanks anyone for your help.

Nick

exome • 2.6k views

ADD COMMENT • link updated 3.9 years ago by Biostar 20 • written 7.7 years ago by calven01 • 0

score 2 · Answer 1 · 2017-03-27

It is an oversimplification. Allelic fractions are a function of tumor purity, allele-specific copy number, cellular fractions (percentage of tumor cells with mutation), and mapping biases. You are probably interested in a differences of cellular fraction, so you need to adjust for any differences in purity, copy number (which most likely would be biologically relevant and also interesting) or biases. You can safely assume that the mapping biases are similar.

If you think there is no change in copy number, all you need to do is to get an estimate of the difference in tumor purity, for example by plotting allelic fractions of all somatic variants against each other. (Heterozygous, diploid, mono-clonal somatic mutations have an expected allelic fraction of half the tumor purity.)

You could then use a binomial or beta-binomial distribution to calculate the probability of observing that many adjusted number of reads in PDX given the coverage and the adjusted allelic fraction in tumor.

superFreq would be a tool pretty much designed for this problem (https://github.com/ChristofferFlensburg/superFreq, not yet published AFAIK). We wrote a similar tool for providing posterior probabilities for all possible genotypes (PureCN in Bioconductor), especially for clinical samples without matched normals.

If you have matched normals, you can use tools like PyClone or SciClone which in addition cluster your variants into clones.