Question

SciClone not numeric error

1

Entering edit mode

8.6 years ago

johntikas ▴ 10

I tried to use sciClone on test data prior to applying any script on my actual patient database, but the program persistently keeps returning the following error message;

[1] "checking input data..." [1] "No copy number files specified. Assuming all variants have a CN of 2." [1] "ERROR: column vaf in sample 5 is not numeric" Error in cleanAndAddCN(vafs[[i]], copyNumberCalls[[i]], i,cnCallsAreLog2, :

I've only tried entering SNP data as my database contains mainly SNVs stemming from targeted sequencing on specific gene panels. Below you can download the test files as well as view the actual script that I tried to run

library(sciClone)
v1 = read.table("folder/nrm.dat");
v2 = read.table("folder/tum1.dat");
v3 = read.table("folder/tum2.dat");
names = c("Normal","Tumor1","Tumor2")
sc = sciClone(vafs=list(v1,v2,v3), sampleNames=names[1:3])

Files; nrm.dat, tum1.dat and tum2.dat

https://www.sendspace.com/filegroup/ram8xDRCKi9mxE7vrYQeE2rjkmLSGffY

SNP snp R genome next-gen • 3.5k views

ADD COMMENT • link updated 8.6 years ago by Chris Miller 22k • written 8.6 years ago by johntikas ▴ 10

score 0 · Answer 1 · 2016-04-08

0

Entering edit mode

8.6 years ago

cbst ▴ 160

You can try the following:

Make sure your variant allele frequency variable is a value between 0 and 100.
Convert the vaf variable into a numeric variable, and/or convert your file into a dataframe

For example for sample 1:

v1 <- data.frame(v1)

v1$vaf <- as.numeric(v1$vaf)

(but make sure that vaf is not a factor, otherwise you will get the levels of vaf, and not the actually value)

ADD COMMENT • link 8.6 years ago by cbst ▴ 160

0

Entering edit mode

I tried the as.numeric solution and it turns out that all values in V3, V4, and V5 are converted in irrelevant/random numbers. My VAFs are expressed as percentages (%) with 4 after comma values eg. 18,4587%

ADD REPLY • link 8.6 years ago by johntikas ▴ 10

score 0 · Answer 2 · 2016-04-08

0

Entering edit mode

8.6 years ago

Chris Miller 22k

Commas are not the same as a decimal point in R.

Your file inputs look like this:

1 4479383 186 28  13,0841
1 6575255  48  0  0,0000
1 7083445 111 20  15,2672
1 8476489 106  8  7,0175

When they need to look like this:

1 4479383 186 28  13.0841
1 6575255 48   0  0.0000
1 7083445 111 20  15.2672
1 8476489 106  8  7.0175

It appears that you can also tell R to treat commas as decimal points, doing something like the below:

a = read.table("tum2.dat",dec=",",sep="\t")

ADD COMMENT • link 8.6 years ago by Chris Miller 22k

0

Entering edit mode

Although the test files work just fine my actual patient data give me "Error in kmeans(X, N.c, nstart = 1000) : more cluster centers than distinct data points." I've tried header=FALSE and still the same.

ADD REPLY • link 8.6 years ago by johntikas ▴ 10

0

Entering edit mode

If you set maxClusters to 10, and you only have, say, 7 points that make it through filtering, the algorithm will choke. Setting it to a lower number may be a short-term fix, but you're unlikely to get reasonable clustering results with only a handful of points anyway. Check your CN calls to make sure you haven't over-called, and/or reduce your minimumDepth (which may give you more points, at the expense of increasing the uncertainty of their true position)

ADD REPLY • link 8.6 years ago by Chris Miller 22k

0

Entering edit mode

I don't have copy numbers, just SNPs. I tried lowering down minimumDepth but I get the same results. My files contain 7-300 rows of SNP data for all sites per patient (primary, metastatic, normal) and I really want to get it working before the ESMO submission deadline.

ADD REPLY • link 8.6 years ago by johntikas ▴ 10

0

Entering edit mode

1) Normal files should not be used for clustering 2) What stdout is sciClone producing? It should say something about the number of copy-number neutral sites with adequate depth.

ADD REPLY • link 8.6 years ago by Chris Miller 22k

0

Entering edit mode

Hi Miller

I am trying add CNV (called from VarScan) as input. The format is as following:

1 861322 2453157 -0.0003

The 4-th column is segment_mean. is this format right as input? The reason I am ask is because the output figure is a little weird. It does not show any peak around 50% VAF although actually these are many in the vcf file

ADD REPLY • link 6.5 years ago by CY ▴ 750

0

Entering edit mode

You really should ask this as a top-level question, not buried in the comments of someone else's comment. Are your CN values log2? If so, then you need to set the appropriate "cnCallsAreLog2" parameter

ADD REPLY • link 6.5 years ago by Chris Miller 22k