I tried to use sciClone on test data prior to applying any script on my actual patient database, but the program persistently keeps returning the following error message;
[1] "checking input data..." [1] "No copy number files specified.
Assuming all variants have a CN of
2." [1] "ERROR: column vaf in sample 5 is not numeric" Error in cleanAndAddCN(vafs[[i]], copyNumberCalls[[i]], i,cnCallsAreLog2, :
I've only tried entering SNP data as my database contains mainly SNVs stemming from targeted sequencing on specific gene panels. Below you can download the test files as well as view the actual script that I tried to run
I tried the as.numeric solution and it turns out that all values in V3, V4, and V5 are converted in irrelevant/random numbers. My VAFs are expressed as percentages (%) with 4 after comma values eg. 18,4587%
Although the test files work just fine my actual patient data give me "Error in kmeans(X, N.c, nstart = 1000) : more cluster centers than distinct data points." I've tried header=FALSE and still the same.
If you set maxClusters to 10, and you only have, say, 7 points that make it through filtering, the algorithm will choke. Setting it to a lower number may be a short-term fix, but you're unlikely to get reasonable clustering results with only a handful of points anyway. Check your CN calls to make sure you haven't over-called, and/or reduce your minimumDepth (which may give you more points, at the expense of increasing the uncertainty of their true position)
I don't have copy numbers, just SNPs. I tried lowering down minimumDepth but I get the same results. My files contain 7-300 rows of SNP data for all sites per patient (primary, metastatic, normal) and I really want to get it working before the ESMO submission deadline.
1) Normal files should not be used for clustering
2) What stdout is sciClone producing? It should say something about the number of copy-number neutral sites with adequate depth.
I am trying add CNV (called from VarScan) as input. The format is as following:
1 861322 2453157 -0.0003
The 4-th column is segment_mean. is this format right as input? The reason I am ask is because the output figure is a little weird. It does not show any peak around 50% VAF although actually these are many in the vcf file
You really should ask this as a top-level question, not buried in the comments of someone else's comment. Are your CN values log2? If so, then you need to set the appropriate "cnCallsAreLog2" parameter
I tried the as.numeric solution and it turns out that all values in V3, V4, and V5 are converted in irrelevant/random numbers. My VAFs are expressed as percentages (%) with 4 after comma values eg. 18,4587%