For a few weeks, we have been using CNVkit to detect CNVs of the size of a gene in our somatic panel.
Enthused by the results, we carried out our study on our constitutional panel this time. Most CNVs detected in this panel measure 1 to 3 exons. These CNVs are not seen by CNVkit. Is this normal?
I search in the manual and I found this sentence :
However, note that CNVkit is less accurate in detecting CNVs smaller
than 1 Mbp, typically only detecting variants that span multiple exons
or captured regions. When used on exome or target panel datasets,
CNVkit will not detect the small CNVs that more common in populations.
But if we create a BED file with small regions (eg 25 or even 12 bp) with the -a option, it is possible to see CNVs of small sizes, up to 2 contiguous exons.
I am also planning to use CNVkit for my constitutional samples. Just wondering about the lower size limit of CNV that CNV kit can detect. What is the size of 2 contiguous exons that you detected? It will be helpful if you could you please provide more details on the bed file that you generated/ command line that you used to create the BED file with smaller regions.
Our capture is about 420 kb (only 35 gènes) . We sequence samples with a depth of coverage about 300b. Finally the 2 CNV detected (no false positives) have a size of 2411 and 98 bases. Two other CNVs of 1 exon were tested, and they could not be detected. They measure 54 and 53bp.
For the command line :
cnvkit.py target Capture.bed --split -a 12-o my_targets.bed
cnvkit.py antitarget my_targets.bed -a 15000 -g data/access-5k-mappable.hg19.bed -o my_antitargets.bed
1500 is the value that makes it possible to obtain an average of similar coverage whether in target and off-target.
data/access-5k-mappable.hg19.bed is a file in the cnvkit directory.
The baseline is created with all the samples of the run (positives are unknown). The rest of the command (coverage, fix, segmentation CBS and call threshold) has not been modified from what is proposed in the manual.
Hi Alice, thank you for sharing these informations! I'm starting with cnvkit and just wanted to know if it's working fine for you at this time and if you made any modifications!
Yes, CNVkit and other segmentation-based copy number callers struggle to accurately detect CNVs in constitutional samples. Using default settings, a single-exon CNV won't show up with the segmenters currently available (though you could in theory use the 'spread' and 'log2' columns in a pooled reference as the basis for a Z-test of each exon in your capture -- this is not yet supported directly).
If your sequencing data are high quality then you can subdivide the targets and antitargets more finely, as your other comment mentions, though this can result in more noise as well. Then if you've managed to increase the sensitivity of CNVkit on your data and are now seeing poor specificity, you can reduce false positives with the segmetrics --ci and call --filter ci commands.
I would like to try the "in theory" method you mentioned to attempt to get the cnv for single exon. I understand how the 'log2' column can be used. But how the 'spread' can be used for the Z-test, can you share your ideas?
In my experiences, we can detect cnv for single exon for constitutional samples.
We use capture, amplicon doesn't works well.
The target is cut bin of 20 bases. The antitarget is cut depending of the depth and the on-target. (We have 100X and 70% on-target so we cut around 20000 bases)
And then we run cnvkit like in the manual.
The autobin command will do this calculation for you, or you can see the source code (cnvlib.autobin) and documentation for that command to see how it's done.
Hi Alice,
I am also planning to use CNVkit for my constitutional samples. Just wondering about the lower size limit of CNV that CNV kit can detect. What is the size of 2 contiguous exons that you detected? It will be helpful if you could you please provide more details on the bed file that you generated/ command line that you used to create the BED file with smaller regions.
Thank you in advance
Our capture is about 420 kb (only 35 gènes) . We sequence samples with a depth of coverage about 300b. Finally the 2 CNV detected (no false positives) have a size of 2411 and 98 bases. Two other CNVs of 1 exon were tested, and they could not be detected. They measure 54 and 53bp.
For the command line :
cnvkit.py target Capture.bed --split -a
12
-o my_targets.bedcnvkit.py antitarget my_targets.bed -a
15000
-g data/access-5k-mappable.hg19.bed -o my_antitargets.bed1500
is the value that makes it possible to obtain an average of similar coverage whether in target and off-target.data/access-5k-mappable.hg19.bed is a file in the cnvkit directory.
The baseline is created with all the samples of the run (positives are unknown). The rest of the command (coverage, fix, segmentation CBS and call threshold) has not been modified from what is proposed in the manual.
Do not hesitate if you have further questions
Alice
Hi Alice, thank you for sharing these informations! I'm starting with cnvkit and just wanted to know if it's working fine for you at this time and if you made any modifications!
Bruno