Question

CNVkit for somatic copy number detection

0

Entering edit mode

5.6 years ago

stephaniem • 0

Hello!

I am trying to use CNVkit (https://cnvkit.readthedocs.io/en/stable/pipeline.html) to detect somatic copy number variations for 40 paired tumor-normal WES samples. I am able to run the pipeline based on the current documentation, but I am unsure how to determine based on the output, if the detected variants are germline or somatic. I am interested in obtaining both, but I am more focused on somatic copy number variations.

Additionally, I would like to be able to tune the parameters (I am using only the defaults now), to run the pipeline most effectively. For example, the autobin step, which bam files should be used, normal or tumor or both; or the reference, to continue to pool all normal or keep tumor-normal pairs for better somatic cnv detection.

Please let me know if you know of any advice or suggestions of how to proceed with this type of analysis. Thanks so much in advance!

Best Regards, Stephanie

cnv cnvkit exome WES • 3.3k views

ADD COMMENT • link updated 6 months ago by Anitha ▴ 10 • written 5.6 years ago by stephaniem • 0

0

Entering edit mode

The normal samples is pooled into single reference for cohort. In this I have a doubt, here the normal sample referred as sample which took from the uninfected patient (non-cancer)?? or the sample which is took from the normal tissue (from the infected cancer patient)??. For doing cohort which sample do i need to use??

ADD REPLY • link 6 months ago by Anitha ▴ 10

score 4 · Accepted Answer · 2019-04-15

For autobin, use normal samples. If you have at least 5 or so normal samples prepared according to the same lab process, use those as a single pooled reference for your cohort.

You can probably assume that the CNV calls you get from the default pipeline are somatic. Population-level CNVs are typically too small to be picked up by CBS segmentation of WES samples, and if you use a pooled reference, then CNVkit will also tend to de-emphasize regions with variable copy number / coverage in your pool of control samples.

It's possible that some of your samples have rare, cancer-associated germline CNVs which would then be present in both the tumor and normal samples from a given patient. The next version of CNVkit (0.9.7, also the current code on GitHub) has an improved HMM segmentation method that can pick up these smaller CNVs, as well as a "bintest" command to test individual exons. To distinguish somatic from germline, I'd recommend building a pooled reference as usual to call CNVs in the tumor sample (using HMM or bintest), then using the same reference to call CNVs in the matched normal. Then compare the two sets of calls to see if any of the normal-sample CNVs are also present in the tumor -- if not, then it's likely a false positive.

If you think you've found a real cancer-associated germline CNV in your sample, check that it makes biological sense -- it should affect a cancer-associated gene in the right direction (e.g. hemizygous loss in a tumor suppressor), and if you have access to any clinical information about the patient, you would expect to see a family history of cancer, young onset of disease, or other cancer-related conditions.