How is the "weight" calculated by CNVkit ?
1
2
Entering edit mode
7.8 years ago
Hällyss ▴ 90

Hello,

We search a kind of score whilch can eliminates many FP call in CNVkit results. We think that the weight can be used. We search a signification of this weight, an equation. In the manual of CNVkit, we found this extract :

A weight is assigned to each remaining bin depending on:

  1. The size of the bin;
  2. The deviation of the bin’s log2 value in the reference from 0;
  3. The “spread” of the bin in the reference.

(The latter two only apply if at least one normal/control sample was used to build the reference.)

So, we have many questions :

  • what "bin" means ? The segment ? The bin in my_target.bed ? my_antitarget.bed ?
  • what is the "spread" ? the number of bins in the segment ? the length of the segment ? something else ?
  • the score seems to be strongly affected by segment size and / or bin number in the segment, is this the case?
  • is it possible to get an equation of the weight ?

Thank you

Alice

cnvkit weight cnv • 4.6k views
ADD COMMENT
1
Entering edit mode

You wrote in the tags which tool this question is about, but that would have been useful information in your post as well.

ADD REPLY
0
Entering edit mode

All of my apologies, thank you for your answer.

ADD REPLY
6
Entering edit mode
7.8 years ago
Eric T. ★ 2.8k

So:

  • Bins are the unsegmented regions seen in my_target.bed, my_antitarget.bed, and the .cnn and .cnr files emitted by CNVkit. Sometimes also called "probes" in the code.
  • Spread is the statistical spread of coverages in a bin observed across all of the samples in your pooled reference, similar to standard deviation but calculated differently to be more robust to outliers.
  • The weight listed in the segmented .cns files is the sum of the weights of the bins/probes spanned by the segment. It correlates with segment length and number of bins, but will be a bit lower if the segment covers a region with less reliable sequencing coverage or mapping (i.e. lower-weight bins).
  • The calculation is in the function cnvlib.fix.apply_weights. It is not one equation; it depends on which data sources are available, mainly whether the reference is paired, pooled, or flat.

Also see cnvkit.py segmetrics --ci and cnvkit.py call --filter ci for filtering out potential FP segments by calculating confidence intervals for each segment's mean log2 ratio.

ADD COMMENT

Login before adding your answer.

Traffic: 3000 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6