Question

CNVkit: Choice of Bin Size and CNV Calling

1

Entering edit mode

5.8 years ago

wei.wei ▴ 10

I noticed that the choice of target bin size has great impact on the result. I ran a sample with the default target bin size of 5000 and when I ran cnvkit.py genemetrics sample.cnr it was reported that there was 0 gene-level gains or losses found. However, for the same sample, when I reduced the bin-size to 1000, there were gains and losses reported, some with really negative log2 values.

I was just wondering, what would be a good choice of the bin-size. How can I ensure that I don't miss out important gains and losses and at the same time minimize false positives. Will the calculated default always be a reasonable choice? Or is there any other factors that I should take into consideration?

I hope the question is clear. Thank you.

cnvkit genemetrics • 3.4k views

ADD COMMENT • link updated 5.8 years ago by Chris Miller 22k • written 5.8 years ago by wei.wei ▴ 10

0

Entering edit mode

Please re frame your question.

How To Ask Good Questions On Technical And Scientific Forums

ADD REPLY • link 5.8 years ago by lakhujanivijay 5.9k

score 2 · Answer 1 · 2019-02-27

2

Entering edit mode

5.8 years ago

Chris Miller 22k

It's a tradeoff. Larger bin sizes reduce noise, smaller bin sizes increase sensitivity. This figure (panels C and D) may offer some clarity on that point, as does some of the text immediately below https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0016327#pone-0016327-g001

In general, given a 30x whole genome, 10k is a good size for high confidence calls, but 1k can also be perfectly acceptable if proper filtering is used.

ADD COMMENT • link 5.8 years ago by Chris Miller 22k

0

Entering edit mode

Hi Chris,

Thanks a lot. Just curious, I saw that readDepth is no longer available for the newer R versions, so is copyCat. Are there any other similar R packages that can be used in CNV calling? I'm using yeast genome.

ADD REPLY • link 5.8 years ago by wei.wei ▴ 10

0

Entering edit mode

CopyCat should work fine with newer R versions, as far as I know, and has a single-sample mode

ADD REPLY • link 5.8 years ago by Chris Miller 22k