Question

Running sciClone on cell prevalence (CP) data

2

Entering edit mode

9.2 years ago

marki ▴ 60

Hi,

As I understand, sciClone uses estimates of variant allele frequency (VAF) in copy-neutral regions to estimate subclonality (thus, in a manner, not using CNV info. Please correct me if I am wrong).

The exome sequencing data, I have got, has relatively fewer mutations and a higher variation in copy number. When I remove the non-neutral copy number regions (copy number != 2), I am left with very few mutations. Given that I have multiple samples for each case, the resulting VAF matrix becomes very sparse leading to poor results.

I was wondering if I can complement the VAF with copy number and ploidy to compute cancer cell fraction (http://www.nature.com/leu/journal/v28/n1/fig_tab/leu2013248f1.html#figure-title) or, alternatively, compute cell prevalence (CP) values using PyClone or ASCAT, and feed that into sciClone. Will sciClone clustering work as it does when using VAFs? If yes, that's excellent; if not, can you recommend some alternative tool for subclonal reconstruction.

Thanks,
Ikram

sciclone subclonal-analysis clonality cancer • 3.3k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.2 years ago by marki ▴ 60

Ram · Answer 1 · 2016-02-17

2

Entering edit mode

9.2 years ago

Chris Miller 22k

Hi Ikram, I'm going to close your github issue and post my answer here.

The short answer is yes, you can convert any kind of measure to a VAF-equivalent and use it.

The longer answer is that doing so is complicated - I've described some of the difficulties here. The easiest class of variants to copy number correct are those with single copy deletion. As each site will be present in 100% of the tumor cells, it's difficult to conflate them with any other subclone/ploidy combination and easy to correct as well (just divide VAF by 2). Others get quite a bit more complicated.

ADD COMMENT • link updated 6.6 years ago by Ram 45k • written 9.2 years ago by Chris Miller 22k

0

Entering edit mode

Hi Chris,

Thanks for the prompt reply. I think the problem is even more complex since copy number calling (in itself) in exome-seq data is not that reliable as in whole genome data. However, given the relatively lower number of mutations, one is always tempted to use all of them as much as possible.

ADD REPLY • link 9.2 years ago by marki ▴ 60

0

Entering edit mode

Indeed, I understand the pain. One more thing you can try is to look at the depth of the data you have and see if it makes sense to reduce your minimum depth settings. You'll lose a little discriminative power between clusters that are very close, but may gain additional points.

ADD REPLY • link 9.2 years ago by Chris Miller 22k