Question

Question about sciClone example data and test script

1

Entering edit mode

8.6 years ago

uki_al ▴ 50

Hello I just started using sciClone and I wanted to get familiar with it using the example data and script listed on the github and I just have two quick questions:

In the example plots, three different cases were analyzed based on the number of input vafs from three related tumor samples. So, for example, for 1d outputs there are three files, one for each call of the sciClone() function. My question is, should the results be identical if you just called the function once, say for example for three different tumor samples? In the file 'clusters3.1d.pdf' there are three pages, one clonality plot for each sample. If you compare the first page of that file to the first page of the file 'clusters1.1d.pdf' - the results are indeed identical. But if you compare the second page of the file 'clusters3.1d.pdf' to the second page of the file 'clusters2.1d.pdf', it is not identical (as in there are 5 clusters in the 'clusters2.1d.pdf' as opposed to the two clusters in the 'clusters4.1d.pdf' files). Also if you compare the 2d cluster results, again the scenario where 2 tumor vafs were provided as inputs there are 5 clusters, but in the scenario with 3 tumor vafs on the 'Sample1 vs Sample2' graph, there are two clusters. Is this some important difference, or why is this being expressed like this?
If I have more than three vaf tumor inputs, what will the outputs be like? Will the 3d output file give multiple graphs, one for each combination of three samples, eg 1-2-3, 1-3-4 and 2-3-4, or is the maximum number of vaf inputs 3?

Thank you for any answers in advance and sorry if something like this has already been asked. With respect, Uros Sipetic

sciclone • 4.4k views

ADD COMMENT • link updated 8.6 years ago by Nikleotide ▴ 130 • written 8.6 years ago by uki_al ▴ 50

score 0 · Answer 1 · 2016-07-01

0

Entering edit mode

8.6 years ago

Chris Miller 22k

If you're running the example script (https://github.com/genome/sciclone-meta/blob/master/tests/shortTest.R), then the answers are as follows:

If you provide 3 samples into your clustering (as is done on line 83), sciClone assigns points to clusters in three dimensions, using all available data. When creating 1d plots, it then displays those points using just the VAFs of the given sample (though the colors/clusters still reflect the 3d clustering). So yes, If you're comparing the 2-sample results to the 3-sample results, then there are likely to be differences!
If you provide more than three vaf tumor inputs, then sciclone will, by default, output all possible 2d plots (every pairing of samples). The 3d plot is a bit different because it explicitly requires you to give only 3 samples - you'll have to call it multiple times to get your different trios.

Best of luck!

ADD COMMENT • link 8.6 years ago by Chris Miller 22k

0

Entering edit mode

(Sorry for repost, just realized replies should go here)

Hm, thank you for your answers. A follow up on question 1:

In the example script, when calling the function sciClone() function on 1 sample, the parameter regionsToExclude was set to "reg1". But when calling it with 2 samples, the same parameter was not set at all. Is there a reason to it, or was it just for showcase? Also, when calling it with 3 samples, it is set to "list(reg1,reg1)". Why is it set in that exact format? Does making a list with 2 objects in it mean to just exclude regions in the first two samples, and not the third one? If I want to exclude it in all three samples, can I call it with "list(reg1,reg1,reg1)"?

And about your first answer, I'm still not sure I understood completely. I mean, when looking at the 1d plots from just one sample and from three samples, in both cases there are 2 clusters, yet when providing 2 samples, there are 5 clusters. It's the same situation as when looking at 2d plots from two sample and from three samples - in the case with 2 samples there are 5 clusters and in the case with 3 samples there are 2 clusters (even for the sample1-vs-sample2 graph, which I thought should be identical?). I mean, my only question to this would be, if I had three samples for example, is there a point in calling the sciClone() function multiple times with different inputs of two samples, or can I just call it once with 3 samples? Am I getting all the correct results with just one call of the function?

Sorry for bothering with this, and thanks again in advance for any answers.

With respect, Uros Sipetic

ADD REPLY • link 8.6 years ago by uki_al ▴ 50

0

Entering edit mode

In the example script, when calling the function sciClone() function on 1 sample, the parameter regionsToExclude was set to "reg1". But when calling it with 2 samples, the same parameter was not set at all. Is there a reason to it, or was it just for showcase?

Just showcasing the options.

Also, when calling it with 3 samples, it is set to "list(reg1,reg1)". Why is it set in that exact format? Does making a list with 2 objects in it mean to just exclude regions in the first two samples, and not the third one? If I want to exclude it in all three samples, can I call it with "list(reg1,reg1,reg1)"?

You're right that this wasn't very clear. The idea is that if you have lists of regions to exclude per sample, you can send them all in. In the end, it doesn't matter, as all regions to exclude get excluded from all samples.

if I had three samples for example, is there a point in calling the sciClone() function multiple times with different inputs of two samples, or can I just call it once with 3 samples? Am I getting all the correct results with just one call of the function?

You should always use as many samples as you have in the clustering, otherwise you're throwing away information that might help you cluster more accurately! Doing individual samples or pairs just doesn't make sense when you have more data.

Best of Luck!

ADD REPLY • link 8.6 years ago by Chris Miller 22k

0

Entering edit mode

Thank you a lot for your answers Mr Chris Miller! One last quick question

The 3d plot is a bit different because it explicitly requires you to give only 3 samples - you'll have to call it multiple times to get your different trios.

By this you mean I'll have to call the sciClone() function multiple times, not the plot3d() function? As I see, plot3d() is method called on the sciClone object, is there a way to invoke the sciClone() function once and then call the plot3d() as many times as I need for my different trios, or am I to call the sciClone() multiple times?

ADD REPLY • link 8.6 years ago by uki_al ▴ 50

0

Entering edit mode

call sciClone() once, call plot3d multiple times if necessary to accomodate all samples.

ADD REPLY • link 8.6 years ago by Chris Miller 22k

0

Entering edit mode

Yes I just saw how to call it with different sample names, sorry for not reading the man, thanks for the answer!

I actually got another question: in the example files, the cn files, the last columns are copy number values in log2 format, am I correct, or is it something else? The parameter cnCallsAreLog2 is by default set to FALSE, and in the example run script, that parameter wasn't changed to TRUE. And the cn files are loaded like this: cn1 = cn1[,c(1,2,3,5)]. Just by a quick look at the file, I figured column4 is meant to be absolute copy numbers and column 5 copy numbers in log2 format. Should that parameter be set to TRUE then, or is this something else?

Also, is there a general best practice recommendation as to what to use for getting the cn files? Is using the output of Control-FREEC for both WGS and WES data, with the last column in those files being absolute cn data, an ok solution?

ADD REPLY • link 8.6 years ago by uki_al ▴ 50

0

Entering edit mode

I'm agnostic to CN callers. We frequently use varscan and copyCat, but there are many others that are reasonable. The input files should be in the quasi-standard 5col format: chr, start, stop, num.probes, segment_mean. Yes, if col5 is log2, then cnCallsAreLog2 should be set to TRUE

ADD REPLY • link 8.6 years ago by Chris Miller 22k

0

Entering edit mode

Hi, thanks! So, by 'segment_mean' you are referring to absolute copy number? And in the example files, does col5 represent log2 values, or...?

ADD REPLY • link 8.6 years ago by uki_al ▴ 50

0

Entering edit mode

Column 5 in the example data is absolute copy number (not log 2), where 2 = normal diploid

1   1   315000  3   1.29315929380905
1   315001  50150000    4827    2.26922552952336
1   50150001    51060000    91  2.52188064895859
1   51060001    70690000    1962    2.23147864464993
1   70690001    70720000    3   1.93053404974351

ADD REPLY • link 8.6 years ago by Chris Miller 22k

0

Entering edit mode

Hi, thanks for the answer once more! I was a bit confused at first because i expected absolute copy numbers to be of integer type. But I found another post, realizing these numbers are probably converted from log2 format and then rounding up is used in the process.

I've actually tried using the Varscan2 pipeline with DNAcopy tool to get the proper input files for sciClone. The whole pipeline is recommended on the Varscan2 website: http://varscan.sourceforge.net/copy-number-calling.html and the code for DNAcopy is proposed by you as well, as I saw, so thank you on that one too!

Though I just wanted to ask one thing quickly, the results I get are like, for example (omitting column 4):

chr1 13026 20669 45.7133

chrM 512 5744 63.1935

chrM 5844 16308 75.9648

It's different in that it seems only some segments are kept (and it's those with with a really high cn value). Should my input cn files be like the example files, i.e. with all segments, or this won't work? Or any special changes or parameters need to be made/set in DNAcopy part or Varscan copynumber/copycaller parts?

ADD REPLY • link 8.6 years ago by uki_al ▴ 50

0

Entering edit mode

Hi, may I ask what the 4th column is? In the readme file on github, they say 4 columns (#read in segmented copy number data #4 columns - chr, start, stop, segment_mean)

Any idea?

ADD REPLY • link 8.6 years ago by Nikleotide ▴ 130

0

Entering edit mode

Whoops - yeah, 4 columns are necessary for sciclone. I usually use a little code to strip that 4th (of 5 columns) out. The typical CN output format is [chr, start, stop, num_probes, segment_mean]. SciClone needs [1:3,5]

ADD REPLY • link 8.6 years ago by Chris Miller 22k

score 0 · Answer 2 · 2016-07-12

0

Entering edit mode

8.6 years ago

Nikleotide ▴ 130

I have two samples I am trying to work out with sciClone. I tried to follow the exact format given in the readme file but once I try to run the sciClone, this is the error I get:

[1] "checking input data..." Error in $<-.data.frame(*tmp*, "cn", value = c(2.23948, 2.23948, 2.23948, : replacement has 5 rows, data has 1

I would really appreciate any input.

Thanks.

ADD COMMENT • link 8.6 years ago by Nikleotide ▴ 130

0

Entering edit mode

1) I'd suggest posting this in a new question 2) We're going to need more info to help. Can you post the first few lines of your CN file, as well as the code that reads it in and calls the sciClone() function?

ADD REPLY • link 8.6 years ago by Chris Miller 22k