When you study the subclonality of a tumor, for example by using a tool like SciClone or PyClone, you have to adjust the frequencies of the mutations present in the tumor with the tumor content of the biopsy, aka the tumor purity.
As far as I know, there are three methods for correcting for tumor purity when studying the variant allele frequency of a mutation present in a tumor.
- Using the tumor content estimated by a pathologist using histochemical staining
- Using the tumor purity estimated by a software like ABSOLUTE or ASCAT
- By setting what seems to be the founder clone, meaning the clone that contains all mutations common to all tumor cells, to 50%, and then scaling the rest of the mutations according to that. Like described in this paper:
The cluster most representative of clonally dominant diploid heterozygous sSNVs in each patient is indicated by an asterisk in the patient legend. Tumor content is calculated by multiplying the mean variant allele frequency of this cluster by two in each time point.
In my analysis, there are sometimes large discrepancies between the tumor purity values I get using these 3 methods. I have whole-exome sequencing data.
For example, I have 2 biopsies from 2 different time points from the same patient where basically none of the mutations exceed a non-corrected variant allele frequency of 30 %, even though the pathologist assessed those samples to have a purity of 90% (and it should be rather accurate). If I set the founding clone myself by identifying clusters, using method 3, it would give a tumor purity as low as 20-30%.
So one hypothesis could be that maybe there is no such "mutation driven" founder clone present in that sample, and that other events such as epigenetic changes, or rearrangements, could have been the early events that "established" the tumor. But if so, does it make sense biologically to have all somatic point mutations common to two biopsies from two different time points not exceeding 30%? Wouldn't I expect at the second time point to see at least one clone around 50%?
That's an interesting thought. For my samples I have exome sequencing data of 300x depth. At such a high depth I would expect the mutations present in all tumor cells (which would be the founding clone) to be detected, as you write in this paper:
But if what you say is true, I still think it strange that in two consecutive samples from the same patient I cannot detect a founding clone in both cases. I could miss it in one because of technical issues regarding how the biopsy was made, and the sequencing, but I don't think it would make sense to miss it in both. Unless that founding clone consists of mutations that occured only in the non-coding region of the genome. I could not find examples of that in the litterature, but I might of course have overlooked something.
To my mind, the only solution that makes sense both biologically and mathematically is that one you propose: the samples I am struggling with must have a much lower tumor purity than first estimated by the pathologist.
You as an experienced analyst, how do you typically adjust allele frequencies for purity?