I'm having a try with sciClone for my exome-seq data. It work fine in the one-sample running but an error reported when using two samples (tumors from two different individuals) for comparison.
I use the SNV and CNA information.
The SNV file format:
contig position t_ref_count t_alt_count tumor_f (all tab-delimited)
The CNA file format: chr, start, stop, segment_mean (which is followed by the VarScan and DNAcopy processing pipeline)
The code used in the two sample comparison:
sc = sciClone(vafs=list(v1,v2),copyNumberCalls=list(cn1,cn2),
sampleNames=names[1:2],cnCallsAreLog2=TRUE,minimumDepth=50)
But only reporting this error:
[1] "checking input data..."
[1] "Not all variants fall within a provided copy number region. The copy number of these variants is assumed to be 2."
[1] "Not all variants fall within a provided copy number region. The copy number of these variants is assumed to be 2."
[1] "ERROR: no sites are copy number neutral and have adequate depth in all samples"
I've tried changing the minimumDepth value but still can't work. The SNV and CNA file header names shouldn't be the problem since I deleted the header (reading file with: read.table(file)) the problem still exist.
Appreciating anyone's help.
Do you have readcounts for each site in both tumors? The first two columns of the lists of variants that you feed in should be identical.
Do you mean I have to merge the chromosome position of the two samples?
For example:
tumor1:
tumor2:
So, the final first two columns should be:
tumor1:
adding the positions which didn't appear in tumor1 mutation list (in other word, the different positions between the two samples): here the "chr1 112" and "chr1 278" et al. in tumor2.
And similar adding and calculations for tumor2?
Is your tumor polyploid, or mostly CN altered? If so, no points will be usable.
Actually, from the result density plot of sample 1, it looks the majority are single-copy amplified (should be the wild-type allele amplified) which should be one triploidy. Would you explain why the polyploid can't work?
Is your CN data formatted incorrectly? By default, sciClone expects absolute copy number values (CN 2 = neutral, CN 3 = 1 copy amplified, etc). You can feed it log2 values by passing the appropriate flag.
I set the cnCallsAreLog2=TRUE.
Is your data low-coverage, such that no points are exceeding the minimum depth threshold?
When running in the one-sample, the minimumDepth 100 and 50 work fine. I tries different minimumDepth values, and even setting to 10 can't work in the two-sample running. Do the minimumDepth value represent the true coverage?
Thanks!
1) Regarding the mutation lists, If you call the following mutations:
You then need to merge the calls and pull readcounts for every site in both samples, so your files will look like:
2) Read the sciClone paper for a full explanation of why we only consider copy-number neutral sites for clustering.
OK, thank you. So if I pre-filter those copy-number amplified and deleted regions and only maintain the neutral regions to feed sciClone, can that way be fine?
SciClone can take care of removing those regions, if all your inputs are set up correctly. No need to filter yourself.
I think there are some problems with my copy-number data. I'll try and adjust again in the CNA calling and then try sciclone again.