Exomedepth: Questions Related To The Warning Messages And How To Select Reference Exomes
2
0
Entering edit mode
10.9 years ago
yyaobo ▴ 30

Searched a while for my questions below of using ExomeDepth, I thought this would be the best place to ask. I have two questions related to the use of the software:

1: How many reference exomes I should use: on the same batch I have 30 samples, should I use as most samples as possible as references or just 5 - 10 of them according to the manual?

2: After I selected the reference panel with the code:

my.choice <- select.reference.set (test.counts = my.test, reference.counts = my.reference.set, bin.length = (ExomeCount.dafr$end - ExomeCount.dafr$start)/1000, n.bins.reduced = 10000)

and printed selected samples with print(my.choice[[1]]), I got only two samples names printed out, but there are ten samples in my.reference.set. How could this happen? There were 3 identical warning messages while computing the likelihoods and fitting the model:

Warning message: In aod::betabin(data = data, formula = as.formula(formula), random = ~1, : The data set contains at least one line with weight = 0.

As a beginner of R, I can not understand the warning message. Could you help me to clarify it please?

Thanks a lot.

ngs exome • 4.3k views
ADD COMMENT
1
Entering edit mode
10.9 years ago

The main contribution of exomeDepth is really to select that reference for you. So you can provide all 30, the select.reference.set will pick for you the best ones. So go for 30 and see what the software decides to pick.

Now if you print the reference samples, and you see only 2 names, it is probably because exomeDepth decided that out of the 10 you provided, the combination of these 2 samples was the best choice. This seems low, but perhaps a consequence of limited correlations between pairs of exomes. I suggest you do it again with all 30 and see what ExomeDepth suggests to pick. I'd be happier if exomeDepth found it optimum to pick 5-6 but I guess it's best to let the algorithm choose. The heuristics should be OK.

One more thing, when you run the calling CNV function in the latest exomeDepth 1.0, there should now be a message that computes the correlation between the test and the reference exome. Hopefully that number is very high (> 0.98 or so). You may want to let me know what you get, just to see what to expect.

You can ignore the warning message, I should find a way to remove this. Not important.

ADD COMMENT
0
Entering edit mode
10.9 years ago
yyaobo ▴ 30

Hi Vincent,

This is what I get when I use all of the other samples. It seems that only the first 10/11 samples were used in calculation, and choice of ref is limited to those calculated samples. A bug or I did something wrong?

> my.choice <- select.reference.set (test.counts = my.test,
+ reference.counts = ExomeCount.mat[,2:30],
+ bin.length = (ExomeCount.dafr$end - ExomeCount.dafr$start)/1000,
+ n.bins.reduced = 10000)
Optimization of the choice of aggregate reference set
Number of selected bins: 10000
Now fitting the beta-binomial model: this step can take a few minutes.
Now computing the likelihood for the different copy number states
Now fitting the beta-binomial model: this step can take a few minutes.
Now computing the likelihood for the different copy number states
Now fitting the beta-binomial model: this step can take a few minutes.
Now computing the likelihood for the different copy number states
Now fitting the beta-binomial model: this step can take a few minutes.
Now computing the likelihood for the different copy number states
Now fitting the beta-binomial model: this step can take a few minutes.
Now computing the likelihood for the different copy number states
Now fitting the beta-binomial model: this step can take a few minutes.
Now computing the likelihood for the different copy number states
Now fitting the beta-binomial model: this step can take a few minutes.
Now computing the likelihood for the different copy number states
Now fitting the beta-binomial model: this step can take a few minutes.
Now computing the likelihood for the different copy number states
Now fitting the beta-binomial model: this step can take a few minutes.
Now computing the likelihood for the different copy number states
Now fitting the beta-binomial model: this step can take a few minutes.
Now computing the likelihood for the different copy number states
Now fitting the beta-binomial model: this step can take a few minutes.
Now computing the likelihood for the different copy number states
Warning message:
In aod::betabin(data = data, formula = as.formula(formula), random = ~1,  :
  The data set contains at least one line with weight = 0.

> my.choice
$reference.choice
[1] "F03665"
[2] "F00603"
[3] "F03566"
[4] "F05163"

$summary.stats
                                                                          ref.samples
F03665     F03665
F00603     F00603
F03566     F03566
F05163     F05163
F03450_6 F03450_6
F05506     F05506
F07191     F07191
F09591     F09591
F98407     F98407
F06341     F06341
F98212     F98212
F0852       F0852
D134130   D134130
F99359     F99359
F99129     F99129
F03513     F03513
F08560     F08560
F09508     F09508
F13349     F13349
F08196     F08196
F98468     F98468
F08744     F08744
F11151     F11151
F10149     F10149
F02209     F02209
F05460     F05460
D180850   D180850
F03450_8 F03450_8
F09507     F09507
                                           correlations expected.BF
F03665      0.9900229    1.846178
F00603      0.9877724    2.201156
F03566      0.9872823    2.372635
F05163      0.9856162    2.514658
F03450    0.9850413    2.471248
F05506      0.9844204    2.470111
F07191      0.9839191    2.413353
F09591      0.9839002    2.473522
F98407      0.9832699    2.448504
F06341      0.9824921    2.424740
F98212      0.9812242          NA
F0852       0.9804988          NA
D134130     0.9801291          NA
F99359      0.9797175          NA
F99129      0.9785509          NA
F03513      0.9778758          NA
F08560      0.9776727          NA
F09508      0.9771703          NA
F13349      0.9767320          NA
F08196      0.9751705          NA
F98468      0.9743046          NA
F08744      0.9741085          NA
F11151      0.9732620          NA
F10149      0.9726253          NA
F02209      0.9725516          NA
F05460      0.9702661          NA
D180850     0.9685933          NA
F03450_8    0.9673487          NA
F09507      0.9553400          NA
                                                    phi  RatioSd     mean.p
F03665   0.0044527603 1.444838 0.33800956
F00603   0.0029197167 1.465977 0.21005173
F03566   0.0020878506 1.475746 0.14665704
F05163   0.0015396843 1.460373 0.11260082
F03450_6 0.0014101618 1.495717 0.09433444
F05506   0.0012403752 1.510007 0.08016918
F07191   0.0011619891 1.540403 0.06996240
F09591   0.0010043407 1.524936 0.06265804
F98407   0.0009342704 1.540521 0.05622814
F06341   0.0008792117 1.553603 0.05138484
F98212   0.0008246772 1.575700 0.04592303
F0852              NA       NA         NA
D134130            NA       NA         NA
F99359             NA       NA         NA
F99129             NA       NA         NA
F03513             NA       NA         NA
F08560             NA       NA         NA
F09508             NA       NA         NA
F13349             NA       NA         NA
F08196             NA       NA         NA
F98468             NA       NA         NA
F08744             NA       NA         NA
F11151             NA       NA         NA
F10149             NA       NA         NA
F02209             NA       NA         NA
F05460             NA       NA         NA
D180850            NA       NA         NA
F03450_8           NA       NA         NA
F09507             NA       NA         NA
                                           median.depth selected
F03665          157.0    FALSE
F00603          300.0    FALSE
F03566          466.0    FALSE
F05163          630.0     TRUE
F03450_6        770.0    FALSE
F05506          919.5    FALSE
F07191         1066.0    FALSE
F09591         1198.0    FALSE
F98407         1346.0    FALSE
F06341         1480.5    FALSE
F98212         1668.0    FALSE
F0852              NA    FALSE
D134130            NA    FALSE
F99359             NA    FALSE
F99129             NA    FALSE
F03513             NA    FALSE
F08560             NA    FALSE
F09508             NA    FALSE
F13349             NA    FALSE
ADD COMMENT
1
Entering edit mode

Yes at some point adding more samples is not useful and the algorithm seems to stop optimizing. So no worries here, it stops after 11 samples or so and selects 4 of them as a reference. All makes sense.

ADD REPLY

Login before adding your answer.

Traffic: 1798 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6