CNVkit: Overlapping Segments?
2
0
Entering edit mode
8.5 years ago
fongchunchan ▴ 10

After running cnvkit.py segment, I've noticed that some segments appear to overlap with each other. For instance,

 18  18520842  60987290  BCL2  0.83552 424 131.823
 18  60987231  78016748  BCL2  0.04287 175 54.1546

The end of the first segment (60,987,290) overlaps with the beginning of the second segment (60,987,231). This is expected? I was expecting that the segments would be non-overlapping and the reason why is an issue is when overlapping mutations onto these segments. The mutation may be associated with multiple copy number predictions which should not be the case.

Thanks,

cnvkit • 3.0k views
ADD COMMENT
3
Entering edit mode
8.5 years ago
Eric T. ★ 2.8k

In general the segments should not overlap. Diagnostics:

  • Which version of CNVkit are you using? The current is v0.7.11.
  • Is your sequencing protocol hybrid capture? Did you use the baited regions or "capture targets" BED file?
  • Did you use the "--split" argument with the batch or `target?
  • Which segmentation method did you use, the default CBS or another one?

(The note in the gainloss docs refers to a different situation where there is a breakpoint in the middle of the gene, so the "left" half of the gene belongs to one segment and the right half belongs to another.)

ADD COMMENT
0
Entering edit mode
  • 0.7.11
  • Yes it's a hybrid capture and yes I used a capture targets bed file.
  • I am not using the --split argument
  • Using the default segmentation method.
ADD REPLY
0
Entering edit mode

Thanks. My guess is that your target BED file contains overlapping intervals. Using the --split option will merge the overlapping intervals and then divide them evenly, which should solve your problem.

ADD REPLY
0
Entering edit mode

Thanks. Using the --split option seems to have solved the overlapping intervals.

ADD REPLY
0
Entering edit mode

+1 for the detailed answer , I guess the OP should have a workaround now and if these answers solves the problem suing the --split parameter and changing the segmentation method, it would be nice to get the follow up reply from him. It keeps up with the interest of the forum.

ADD REPLY
0
Entering edit mode
8.5 years ago
natasha.sernova ★ 4.0k

See this pdf:

https://media.readthedocs.org/pdf/cnvkit/v0.4.1/cnvkit.pdf and find there:

2.3.2 gainloss

"If segments are given, the log2 ratio value reported for each gene will be the value of the segment covering the gene. Where more than one segment overlaps the gene, i.e. if the gene contains a breakpoint, each segment’s value will be reported as a separate row for the same gene. If a large-scale CNA covers multiple genes, each of those genes will be listed individually".

ADD COMMENT

Login before adding your answer.

Traffic: 2348 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6