Question

CNVkit Segments vs Bins

2

Entering edit mode

7.7 years ago

Lauren ▴ 70

After reading the paper and docs, I am having a little trouble understanding the difference and usage for calling at the segment level or the bin level. I am running CNVkit using human exome seq data.

My questions are:

What use case would the segment level calls be best for

What use case would the bin level calls be best for

Which one is more accurate in what context

What is the point of having two? Is the bin level output noisier but at a higher resolution than the segment level? Just assuming default segmentation and bin sizes. I'm really confused here -- anything helps!

CNVkit copy number exome segment • 4.1k views

ADD COMMENT • link updated 7.7 years ago by Eric T. ★ 2.8k • written 7.7 years ago by Lauren ▴ 70

score 4 · Accepted Answer · 2017-03-07

The two file types might be more intuitive if you have experience with an older microarray-based method, array comparative genomic hybridization (aCGH). Bins are equivalent to microarray probes there.

The bins provide a fine-grained genome-wide copy number signal plus some noise. Segmentation attempts to remove the noise and infer the location of discrete copy number alterations, i.e. the individual regions that have been duplicated or deleted. Segments that are not neutral (i.e. diploid, log2=0) are putative copy number alterations. CNVkit's call command helps infer more about the segments beyond their breakpoints.

So, in general, use the segments (.cns) for most follow-up analysis. The bin-level data (.cnr) is useful for plotting and showing the level of support for each segment, and for tracking down potential artifacts like especially noisy regions of the genome. Also, in the case of small or single-exon CNA, the .cns file will typically not include it but the .cnr file may show some evidence for the copy number change that you could then look to confirm independently.