Hi, I want to use sciclone on my exome sequencing data.I use varscan to call copy number:
samtools mpileup -B -q 30 -l $exon_bed -f $fa $normal $tumor >$name.tumor-nor.pileup
java -jar VarScan.v2.3.9.jar copynumber $name.tumor-nor.pileup $name --mpileup 1
java -jar VarScan.v2.3.9.jar copyCaller $name.copynumber --output-file filter.$name.copynumber
In order to obtain the segment_mean data,I use DNAcopy (http://varscan.sourceforge.net/copy-number-calling.html#copy-number-segmentation)
library(DNAcopy)
cn <- read.table("filter.$name.copynumber",header=F)
CNA.object <-CNA(genomdat = as.numeric(cn[,7]), chrom = as.numeric(cn[,1]), maploc = as.numeric(cn[,2]), data.type = 'logratio')
CNA.smoothed <- smooth.CNA(CNA.object)
segs <- segment(CNA.smoothed, verbose=1, min.width=2)
segs2 = segs$output
write.table(segs2[,2:6], file="out.file", row.names=F, col.names=F, quote=F, sep="\t")
Above all without errors.
I got a out.file like these:
1 89 34735 1614 874.4554
1 34736 34750 4 3295.75
1 34751 37517 661 898.7126
1 37518 37536 8 2306.375
1 37537 38024 207 923.657
1 38027 38038 4 3223.75
1 38039 55963 2290 898.076
The segment_mean value is so large but I saw the value in example data is probably less than 3. I used the results as sciclone's input data,but the plot is not complete. Report these:
source("run.R") [1] "checking input data..."
[1] "Not all variants fall within a provided copy number region. The copy number of these variants is assumed to be 2."
604 sites (of 13103 original sites) are copy number neutral and have adequate depth in all samples 120 sites (of 13103 original sites) were removed because of copy-number alterations
12497 sites (of 13103 original sites) were removed because of inadequate depth
12499 sites (of 13103 original sites) were removed because of copy-number alterations or inadequate depth
[1] "clustering..."
kmeans initialization:
V1
0.4092
0.356976470588235
0.256398076923077
0.279226666666667
0.234887804878049
0.311459375
0.480088888888889
0.204376436781609
0.218423943661972
0.7059
Using threshold: 0.7
Dropped cluster 1 with too few variants ( 0 ) center: 0.499965
Dropped cluster 1 with too few variants ( 0 ) center: 0.499965
Dropped cluster 1 with too few variants ( 0 ) center: 0.499965
Dropped cluster 2 with too few variants ( 0 ) center: 0.499965
Dropped cluster 3 with too few variants ( 0 ) center: 0.499965
Dropped cluster 3 with too few variants ( 0 ) center: 0.499965
Dropped cluster 4 with too few variants ( 1 ) center: 0.7047737
Condition ( 1 D): Removing 1 because of overlap ( 0.1035335 ) with i = 3
Condition ( 1 D): Removing 1 because of overlap ( 0.18061 ) with i = 2
Cluster 1 pi = 0.998 center = 0.241 SEM = (0.240, 0.243) sd = (0.204, 0.270)
Outliers:
[,1]
[1,] 0.3
Converged on the following parameters:
mu:
81510.7681795071
alpha:
867.472949068126
nu:
49909.3536335395
beta:
168.832704565698
pi:
0.998344374600373
[1] "finished clustering full-dimensional data..."
[1] "found 1 clusters using bmm in full dimensional data"
The copy number of these variants all was assumed to be 2,even if I divide all data in fifth column by 1000,The results did not change.
Hi, lyan:
Did you fix your problem? I had the similar problem with you. After running the DNA copy, my segment_mean value is very big. I have around 4000 somatic mutations. Thanks.
HY
I was following this :http://wp.zxzyl.com/?p=156
hope this helps. Lyan