Question

Question regarding for sciClone when doing a 2d plot

0

Entering edit mode

9.8 years ago

sxl919 ▴ 10

Hi,

I think I have ran into a problem when I was using sciClone in order to generate a 2d plot. I have my bam files which have been processed by varscan somatic and I got the snp and indel files. After doing somatic filter and processsomatic, I put the data in somatic file to sciClone. As I have 2 samples, I instruct the program to read 2 set of snv data and copynumber data.The data format for the snv file looks like this

Chromosome      position   tumor_read1   tumor_read2   vaf_precentage
Chr 1                   10011             8               4                        30
Chr 2                   12331             6               3                        25
...
Chr Y

However, when I using sciClone in order to generate the plot and cluster file, I got the following error:

Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 7, 0
In addition: Warning message:
In max(marginalClust[[i]]$cluster.assignments, na.rm = T) :
  no non-missing arguments to max; returning -Inf

I am really new to R and if I have asked a stupid question, please forgive me.

Thanks

SciClone 2d-plot R • 3.2k views

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by sxl919 ▴ 10

0

Entering edit mode

Please provide the following info, and I'll be happy to take a look: 1) the exact R commands that you're trying to run, 2) the first few lines of your input files

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by Chris Miller 22k

0

Entering edit mode

Thank you Chris,

R-code:

library(sciClone)

#read in vaf data from three related tumors
#format is 5 column, tab delimited: 
#chr, pos, ref_reads, var_reads, vaf

v1 = read.table("Sample1_recal.bam-0209.snp.somaticfiltered.Somatic",header=T);
v1 = v1[,c(1,2,9,10,24)]
v2 = read.table("Sample2_recal.bam-0209.snp.somaticfiltered.Somatic",header=T);
v2 = v2[,c(1,2,9,10,24)] #to get the chromosome, starting position, Reads supporting reference in tumor, Reads supporting variant in tumor, and the Variant allele frequency in tumor*100 as the origin vaf is in precentage and sciclone accept 1-100 but not 1%-100%

cn1 = read.table("Sample1_recal.bam-0209-1.copynumber",header=T);
cn1 = cn1[,c(1,2,3,7)]
cn2 = read.table("Sample2_recal.bam-0209-1.copynumber",header=T);
cn2 = cn2[,c(1,2,3,7)] #to get the chromosome, starting position, ending position and the log2 ratio

names = c("Sample1", "Sample2")
sc = sciClone(vafs=list(v1,v2),
              copyNumberCalls=list(cn1,cn2),
              sampleNames=names[1:2],
              cnCallsAreLog2=TRUE,
              minimumDepth=10,
              maximumClusters=7)
writeClusterTable(sc, "results/clustersSample1_vs_Sample2-0210")
sc.plot1d(sc,"results/clustersSample1_vs_Sample2-0210_1d.pdf")
sc.plot2d(sc,"results/clustersSample1_vs_Sample2-0210_2d.pdf")

The first few lines of my input files are SNV for sample 1:

chrom     position     ref     var     normal_reads1     normal_reads2     normal_var_freq     normal_gt     tumor_reads1     tumor_reads2     tumor_var_freq     tumor_gt     somatic_status     variant_p_value     somatic_p_value     tumor_reads1_plus     tumor_reads1_minus     tumor_reads2_plus     tumor_reads2_minus     normal_reads1_plus     normal_reads1_minus     normal_reads2_plus     normal_reads2_minus     precentage
chr1      500630       A       G       22                0                 0%                  A             8                4                33.33%             R            Somatic            1                   0.010674            0                     8                      0                     4                      2                      20                      0                      0                       33.33
chr1      1221916      T       C       84                10                10.64%              T             203              52               20.39%             Y            Somatic            1                   0.02198             95                    108                    39                    13                     36                     48                      7                      3                       20.39
chr1      3001832      C       A       20                5                 20%                 M             36               39               52%                M            Somatic            1                   0.004402            8                     28                     5                     34                     4                      16                      1                      4                       52
chr1      4888019      T       C       22                4                 15.38%              T             23               22               48.89%             Y            Somatic            1                   0.004173            1                     22                     1                     21                     1                      21                      0                      4                       48.89

and for copynumber sample1

chrom     chr_start     chr_stop     num_positions     normal_depth     tumor_depth     log2_ratio     gc_content
chr1      10920         12948        29                10.9             11.8            0.11           48.3
chr1      11959         12977        19                10.4             9.5             -0.13          63.2
chr1      12999         13079        100               19.7             16.2            -0.278         63

However, I noticed that when I open the copynumber file the file cannot be completely load as it has way to many records (I use excel to open it) so will that be a possible reason to have those error messages?

Thanks

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by sxl919 ▴ 10

0

Entering edit mode

Is there any output above the error message you pasted? Maybe something like: ERROR: only N points - not enough points to cluster when using 7 intialClusters. Provide more data or reduce your maximumClusters option

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by Chris Miller 22k

0

Entering edit mode

I use 10, which is the default and 7 for the cluster, it shows

ERROR: only 7 points 0 not enough points to cluster when using 10 intialClusters. Provide more data or 
reduce your maximumClusters option

(I miss that, sorry)

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by sxl919 ▴ 10

Ram · Answer 1 · 2015-02-10

0

Entering edit mode

9.8 years ago

Chris Miller 22k

Okay, based on the info you've provided above, the problem is that you only have 7 copy-number neutral points with adequate depth in this sample. If that's accurate, then you're not going to be able to get any kind of meaningful clonality analysis from 7 points.

You might also double check your data to see if it's a mistake. If this tumor isn't heavily CN-altered and has adequate depth for most points, then you should make sure that your CN values are correct and possibly tweak the copyNumberMargins param.

ADD COMMENT • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by Chris Miller 22k

0

Entering edit mode

Thank you very much. I will double check the data

ADD REPLY • link updated 2.6 years ago by Ram 44k • written 9.8 years ago by sxl919 ▴ 10

Ram · Answer 2 · 2015-02-13

Hi Chris,

Sorry to bother you again. I actually found a similar post in Biostar: sciClone error for two sample

In that post, you said we have to merge the calls and pull readcounts for every site in both samples in order to generate the 2d plot so the input file should look like this

Tumor 1:
    1    111   10  10   50.0
    1    999    6    6   50.0
    2    222    9    9    50.0
    2    888    5    5    50.0

    Tumor 2:
    1    111    4    4    50.0
    1    999    8    8    50.0
    2    222    3    3    50.0
    2    888    7    7    50.0

Yet, how can I do that? Since if I use compare (with the option merge) in Varscan I will just end up with 2 sample calls merge together, which look like this

1    111   10  10   50.0
1    999    8   8    50.0
2    222    9    9    50.0
2    888    7   7    50.0

By the way, my calls was generated by doing the comparison between normal and tumor1 + normal and tumor 2