Question

How is the "weight" calculated by CNVkit ?

2

Entering edit mode

7.8 years ago

Hällyss ▴ 90

Hello,

We search a kind of score whilch can eliminates many FP call in CNVkit results. We think that the weight can be used. We search a signification of this weight, an equation. In the manual of CNVkit, we found this extract :

A weight is assigned to each remaining bin depending on:

The size of the bin;
The deviation of the bin’s log2 value in the reference from 0;
The “spread” of the bin in the reference.

(The latter two only apply if at least one normal/control sample was used to build the reference.)

So, we have many questions :

what "bin" means ? The segment ? The bin in my_target.bed ? my_antitarget.bed ?
what is the "spread" ? the number of bins in the segment ? the length of the segment ? something else ?
the score seems to be strongly affected by segment size and / or bin number in the segment, is this the case?
is it possible to get an equation of the weight ?

Thank you

Alice

cnvkit weight cnv • 4.6k views

ADD COMMENT • link updated 7.8 years ago by Eric T. ★ 2.8k • written 7.8 years ago by Hällyss ▴ 90

1

Entering edit mode

You wrote in the tags which tool this question is about, but that would have been useful information in your post as well.

ADD REPLY • link 7.8 years ago by WouterDeCoster 47k

0

Entering edit mode

All of my apologies, thank you for your answer.

ADD REPLY • link 7.8 years ago by Hällyss ▴ 90

score 6 · Accepted Answer · 2017-03-04

So:

Bins are the unsegmented regions seen in my_target.bed, my_antitarget.bed, and the .cnn and .cnr files emitted by CNVkit. Sometimes also called "probes" in the code.
Spread is the statistical spread of coverages in a bin observed across all of the samples in your pooled reference, similar to standard deviation but calculated differently to be more robust to outliers.
The weight listed in the segmented .cns files is the sum of the weights of the bins/probes spanned by the segment. It correlates with segment length and number of bins, but will be a bit lower if the segment covers a region with less reliable sequencing coverage or mapping (i.e. lower-weight bins).
The calculation is in the function cnvlib.fix.apply_weights. It is not one equation; it depends on which data sources are available, mainly whether the reference is paired, pooled, or flat.

Also see cnvkit.py segmetrics --ci and cnvkit.py call --filter ci for filtering out potential FP segments by calculating confidence intervals for each segment's mean log2 ratio.