Question

Weights in cnvkit cnr output

0

Entering edit mode

5.5 years ago

biologist ▴ 20

Hello everyone,

I have tried reading up on different biostars posts about cnvkit's cnr column weights but am still confused. From https://cnvkit.readthedocs.io/en/stable/pipeline.html, I understand that weights in .cnr are dependent on the size of the bin, deviation of the bin’s log2 value in the reference from 0 and spread” of the bin in the reference. I am not particularly familiar with these terms and so could someone explain what exactly this means and if I can take the cnr's weight column as a measure of how confident we can be about that particular CNV call?

Also, if I want to look at a particular range of a chromosome to look for a CNV that is made up of 5 rows from the cnr output for example, do I take the average value for the 5 log2 values but the SUM of the 5 weight values? I am asking because How is the "weight" calculated by CNVkit ? says that cns takes the sum of the weights of all the bins that make up that segment. In this case, would a higher weight be equivalent to a more confident call?

Any advice is appreciated, thanks!

cnvkit • 2.1k views

ADD COMMENT • link updated 3.5 years ago by Chan ▴ 40 • written 5.5 years ago by biologist ▴ 20

score 1 · Answer 1 · 2019-07-28

The bin weights are correlated with the stability of the copy number signal in that bin's genomic region, as estimated from a pool of control samples (if available) and some other heuristics. In the development version of CNVkit currently on GitHub, the bin weights are more directly an estimate of 1 / variance in log2 coverage ratios at that site.

I wouldn't use the weights directly as an estimate of confidence. Instead I'd use segmetrics to calculate confidence interval. In the development version there's an option to do a one-sample t-test of a segment's bins versus neutral copy number (log2 = 0), which gives you a p-value, if that's what you need.