Question

How to merge CNV of multiple samples from CNVkit

0

Entering edit mode

7.0 years ago

cc ▴ 30

Hi, everyone,

I just got CNVkit results from multiple samples seperately, but it seems that there's no function to merge CNV results from multiple samples in CNVkit. Or, any command I haven't noticed?

Thanks for your interest.

cnv cnvkit • 3.5k views

ADD COMMENT • link updated 7.0 years ago by Eric T. ★ 2.8k • written 7.0 years ago by cc ▴ 30

0

Entering edit mode

Another question about controls.

I have only 2 controls sequenced in the same strategy with 20 cases. Considering the limited number of controls, I tried to run CNVkit with two strategy, which results in obvious different outputs.

reference constructed with pooled normal controls (2 samples) : the .cns of sampleA reports 93 segments.
reference constructed with no control samples ：the .cns of sampleA reports 9 segments, and the last segment is about 5.3M. (The whole target region is 8M in my data)

In this case, what can I do to get a more accurate results？ Or, can I use the reference constructed with no control samples, then call CNV for all cases and 2 controls, and compare the results of cases and controls at last ?

My data are from targeted sequencing of a 8M region (hybrid capture), if I use WGS data of normal controls from other paper to construct the reference of pooled samples. Is it acceptable ?

Thanks in advance

ADD REPLY • link 7.0 years ago by cc ▴ 30

score 1 · Answer 1 · 2017-12-14

What type of result to you want from merging samples?

If you're interested in spotting recurrent copy number alterations, try the heatmap command with all of your .cns files.
If you want to quantify the significance of recurrent alterations, use the export seg command to create a SEG file from each .cns file of interest, then use the output SEG files with a recent version of GISTIC2. (The "markers" file is no longer needed in recent versions of GISTIC, I'm told. Just the SEG files should be enough. I haven't tried this myself.)

If you build a reference from control samples, the control samples should be prepared and sequenced with the same protocol as the test samples -- NOT WGS if the test samples were sequenced with a target panel. If you do have process-matched controls, then a pooled reference built from those controls is usually better than a control-free reference. Otherwise, just use a control-free reference.

Don't worry too much about the number of segments; instead, use the segmetrics and call commands to do further filtering if you need it. You can also repeat segmentation with a more stringent p-value threshold (-t) to reduce the number of segment breakpoints in the .cns files.