sciClone:input data wrong
1
0
Entering edit mode
8.1 years ago
lyan.liu • 0

Hi, I want to use sciclone on my exome sequencing data.I use varscan to call copy number:

samtools mpileup -B -q 30 -l $exon_bed -f $fa $normal $tumor >$name.tumor-nor.pileup
java -jar VarScan.v2.3.9.jar copynumber $name.tumor-nor.pileup $name --mpileup 1 java -jar VarScan.v2.3.9.jar copyCaller $name.copynumber --output-file filter.$name.copynumber

In order to obtain the segment_mean data,I use DNAcopy (http://varscan.sourceforge.net/copy-number-calling.html#copy-number-segmentation)

library(DNAcopy) cn <- read.table("filter.$name.copynumber",header=F) CNA.object <-CNA(genomdat = as.numeric(cn[,7]), chrom = as.numeric(cn[,1]), maploc = as.numeric(cn[,2]), data.type = 'logratio')
CNA.smoothed <- smooth.CNA(CNA.object) segs <- segment(CNA.smoothed, verbose=1, min.width=2) segs2 = segs$output write.table(segs2[,2:6], file="out.file", row.names=F, col.names=F, quote=F, sep="\t")

Above all without errors.

I got a out.file like these:

1 89 34735 1614 874.4554
1 34736 34750 4 3295.75
1 34751 37517 661 898.7126
1 37518 37536 8 2306.375
1 37537 38024 207 923.657
1 38027 38038 4 3223.75
1 38039 55963 2290 898.076

The segment_mean value is so large but I saw the value in example data is probably less than 3. I used the results as sciclone's input data,but the plot is not complete. Report these:

source("run.R") [1] "checking input data..."
[1] "Not all variants fall within a provided copy number region. The copy number of these variants is assumed to be 2."
604 sites (of 13103 original sites) are copy number neutral and have adequate depth in all samples 120 sites (of 13103 original sites) were removed because of copy-number alterations
12497 sites (of 13103 original sites) were removed because of inadequate depth
12499 sites (of 13103 original sites) were removed because of copy-number alterations or inadequate depth
[1] "clustering..."
kmeans initialization:
V1
0.4092
0.356976470588235
0.256398076923077
0.279226666666667
0.234887804878049
0.311459375
0.480088888888889
0.204376436781609
0.218423943661972
0.7059
Using threshold: 0.7
Dropped cluster 1 with too few variants ( 0 ) center: 0.499965
Dropped cluster 1 with too few variants ( 0 ) center: 0.499965
Dropped cluster 1 with too few variants ( 0 ) center: 0.499965
Dropped cluster 2 with too few variants ( 0 ) center: 0.499965
Dropped cluster 3 with too few variants ( 0 ) center: 0.499965
Dropped cluster 3 with too few variants ( 0 ) center: 0.499965
Dropped cluster 4 with too few variants ( 1 ) center: 0.7047737
Condition ( 1 D): Removing 1 because of overlap ( 0.1035335 ) with i = 3
Condition ( 1 D): Removing 1 because of overlap ( 0.18061 ) with i = 2
Cluster 1 pi = 0.998 center = 0.241 SEM = (0.240, 0.243) sd = (0.204, 0.270)
Outliers:
[,1]
[1,] 0.3
Converged on the following parameters:
mu:
81510.7681795071
alpha:
867.472949068126
nu:
49909.3536335395
beta:
168.832704565698
pi:
0.998344374600373
[1] "finished clustering full-dimensional data..."
[1] "found 1 clusters using bmm in full dimensional data"

The copy number of these variants all was assumed to be 2,even if I divide all data in fifth column by 1000,The results did not change.

sciClone varscan DNAcopy cluster copy number • 3.7k views
ADD COMMENT
0
Entering edit mode

Hi, lyan:

Did you fix your problem? I had the similar problem with you. After running the DNA copy, my segment_mean value is very big. I have around 4000 somatic mutations. Thanks.

HY

ADD REPLY
0
Entering edit mode

I was following this :http://wp.zxzyl.com/?p=156
hope this helps. Lyan

ADD REPLY
1
Entering edit mode
8.1 years ago

1) First of all, yes, something is clearly not right in your copy number results. Make sure that you're grabbing the correct input columns when you're feeding it to DNAcopy

2) SciClone tells you exactly what it's doing:

"Not all variants fall within a provided copy number region. The copy number of these variants is assumed to be 2."

For some reason, your entire genome isn't tiled by the DNAcopy segments (which is unusual) Again, fix your cn calls and you'll be in better shape

3) Finally, 13000 somatic variant calls is a lot for an exome. Is this a hypermutated sample or something like a heavily UV damaged melanoma? If not, then you may want to examine your variant inputs.

ADD COMMENT
0
Entering edit mode

Thanks very much for your reply!
1) I used varscan copynumber(the commands in the top page) to generate my copynumber result ,like these :
chrom chr_start chr_stop num_positions normal_depth tumor_depth log2_ratio gc_content
1 65873 65972 100 66.3 91.5 0.464 45.0
1 69482 69513 32 150.8 15.8 -3.259 53.1
1 69514 69600 87 229.4 41.7 -2.461 50.6
the log2_ratio is the column 7,so I use DNAcopy like this : library(DNAcopy)
cn <- read.table("$name.copynumber",header=F)
CNA.object <-CNA(genomdat = as.numeric(cn[,7]), chrom = as.numeric(cn[,1]), maploc = as.numeric(cn[,2]), data.type = 'logratio')
CNA.smoothed <- smooth.CNA(CNA.object)
segs <- segment(CNA.smoothed, verbose=1, min.width=2)
segs2 = segs$output
write.table(segs2[,2:6], file="out.file_test", row.names=F, col.names=F, quote=F, sep="\t")

I was so confused .

3) I test another sample which had only 4000 somatic variants,the report also like these:

source("run.R")
[1] "checking input data..." [1] "Not all variants fall within a provided copy number region. The copy number of these variants is assumed to be 2."
66 sites (of 4127 original sites) are copy number neutral and have adequate depth in all samples
65 sites (of 4127 original sites) were removed because of copy-number alterations
4061 sites (of 4127 original sites) were removed because of inadequate depth
4061 sites (of 4127 original sites) were removed because of copy-number alterations or inadequate depth
[1] "clustering..."
......

the script run.R like: library(sciClone)
v0 = read.table("Sample_1.vafs", sep="\t")
cn0 = read.table("out.file_test")
cn0 = cn0[,c(1,2,3,5)]
clusterParams="empty"
sc = sciClone(vafs=list(v0), sampleNames=c("MMY4"), copyNumberCalls=list(cn0), minimumDepth=100, doClustering=TRUE, clusterParams=clusterParams,maximumClusters=10, copyNumberMargins=0.25, useSexChrs=FALSE)
writeClusterTable(sc, "clusters")
writeClusterSummaryTable(sc, "cluster.summary")
save.image("out.Rdata")
source("plot.R")

ADD REPLY
0
Entering edit mode

Other than CN strangeness, the biggest problem appears to be that the minimumDepth parameter is set too high for the data you have. If you have 30x sequencing, then using a minimum depth of 100x does not make any sense. You should also be aware of the tradeoffs between depth and ability to distinguish clusters - see Fig 5 here: http://www.cell.com/cell-systems/abstract/S2405-4712(15)00113-1

ADD REPLY
0
Entering edit mode

I tried to use minimumDepth=10,only to be filtered less sites, the other did not changed.The problem should be CN data abnormal.Are there any other ways to call copy number or generate segment_mean data ?

ADD REPLY
0
Entering edit mode

Sure, there are many programs that are able to call copy number from paired tumor normal bams. cn.mops is one that comes to mind - there are many others that a quick search on this or other sites will turn up.

ADD REPLY

Login before adding your answer.

Traffic: 1683 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6