I am interested in assessing the clonality of a tumor sample I have which is also paired with a normal tissue control from the same patient (~30x coverage). Specifically, I would like to know if there are any significant sub-clones present within the tumor sample.
We've been working on integrating clonality and heterogeneity estimation tools into bcbio (https://github.com/chapmanb/bcbio-nextgen). It's still a work in progress but we've been evaluating against internal datasets where we have external predictions of normal contamination. We've had the most success with:
Battenberg from Sanger (https://github.com/cancerit/cgpBattenberg). This calls CNVs and provides estimates of normal contamination. It's also a required input for PhyloWGS.
PhyloWGS. We don't currently have a way to validate heterogeneity but get a useful set of trees to estimate how much of a mix is in a tumor sample.
I've had decent results with THetA2, but it's designed for exome or targeted sequencing. It sounds like you're using WGS so another choice like SciClone (code) might be better for you.
If you you're handy with a VCF you can just plot the variant allele frequencies, ideally just the somatic SNVs, and look for a cluster or just a few believable variants with frequency below 0.5 (scaled down for a less-pure tumor sample).
We use SciClone (code) a lot. Some thoughts. It should be noted that tools like this infer clonal architecture by examining (clustering) the distribution of variant allele frequencies (VAFs) of somatic variants. To do this accurately relies on several things. First the VAF estimates must be accurate. With only 30x coverage, you will probably have a lot of variance in your VAF estimates. Other things also influence observed VAFs. For example copy number events. These can alter VAFs and confound clonality estimation if they are not properly corrected or excluded. You need to have observed enough variants across a range from 'dominant' to sub-clonal to get a good picture. If your variant calls are full of false positives you may get a misleading picture. Having multiple time points can be powerful. Good luck.
My initial hunch was to use PhyloWGS
I suggest CLONET
If it is not urgent, I would keep an eye out for this project