CNVkit: Choice of Bin Size and CNV Calling
1
1
Entering edit mode
5.7 years ago
wei.wei ▴ 10

I noticed that the choice of target bin size has great impact on the result. I ran a sample with the default target bin size of 5000 and when I ran cnvkit.py genemetrics sample.cnr it was reported that there was 0 gene-level gains or losses found. However, for the same sample, when I reduced the bin-size to 1000, there were gains and losses reported, some with really negative log2 values.

I was just wondering, what would be a good choice of the bin-size. How can I ensure that I don't miss out important gains and losses and at the same time minimize false positives. Will the calculated default always be a reasonable choice? Or is there any other factors that I should take into consideration?

I hope the question is clear. Thank you.

cnvkit genemetrics • 3.4k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
2
Entering edit mode
5.7 years ago

It's a tradeoff. Larger bin sizes reduce noise, smaller bin sizes increase sensitivity. This figure (panels C and D) may offer some clarity on that point, as does some of the text immediately below https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0016327#pone-0016327-g001

In general, given a 30x whole genome, 10k is a good size for high confidence calls, but 1k can also be perfectly acceptable if proper filtering is used.

ADD COMMENT
0
Entering edit mode

Hi Chris,

Thanks a lot. Just curious, I saw that readDepth is no longer available for the newer R versions, so is copyCat. Are there any other similar R packages that can be used in CNV calling? I'm using yeast genome.

ADD REPLY
0
Entering edit mode

CopyCat should work fine with newer R versions, as far as I know, and has a single-sample mode

ADD REPLY

Login before adding your answer.

Traffic: 1578 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6