Calling CNA from aCGH: Thresholding?
1
1
Entering edit mode
8.9 years ago
Noushin N ▴ 600

Hi everybody,

I am working on array CGH data from a set of tumor samples with potentially high normal contamination. I have performed the necessary pre-processing (background subtraction, within array normalization) on the tumor data, and run the circular binary segmentation on the resulting log ratio values.

I have seen that people often threshold the mean segment values following this step to call gains and losses (a typical value I have seen in literature is a minimum absolute value of 0.3 for log2ratio). However, this process assumes a minimum value of tumor purity, which may not be true for a subset of my samples.

I was wondering if anyone can suggest a more systematic way of approaching this problem in presence of high level (50-80%) of normal tissue contamination in the tumor sample.

Thank you!

array-cgh copy-number • 2.3k views
ADD COMMENT
1
Entering edit mode

Have you estimated the tumor purity of each of your samples already, or is that the next step you're looking to do?

ADD REPLY
0
Entering edit mode

I do have some (possibly not highly accurate) estimates from somatic mutation alllele frequency data, and also the pathology report (more rough).

ADD REPLY
2
Entering edit mode
8.9 years ago
Eric T. ★ 2.8k

In absence of purity estimates, thresholding is pretty much your only option. If your data is noisy then you might opt to focus on high-level amplifications and homozygous deletions in your report. Given another source of data like SNV calls, you can also use the allele frequencies detect loss of heterozygosity, which can support copy number calls for hemizygous losses at least.

Given purity estimates, you can rescale the log2 values with math, and then use thresholds more confidently.

For example, if your segmented aCGH calls are in SEG format (the output of DNAcopy), then you can use CNVkit to adjust the log2 values:

  1. Load a sample into CNVkit's own format:

    cnvkit.py import-seg Sample.seg -o Sample.cns
    
  2. Rescale the log2 values using a purity estimate and optionally the ploidy:

    cnvkit.py rescale Sample.cns --purity 0.45 --ploidy 2 -o Sample-rescaled.cns
    
  3. Optionally, perform the thresholding by assuming 100% purity (now valid) and rounding log2 values to the nearest absolute integer copy number:

    cnvkit.py call -m clonal Sample-rescaled.cns -o Sample-called.cns
    

    Or by using hard cutoffs (in log2 scale):

    cnvkit.py call -m threshold -t=-1.1,-0.4,0.3,0.7 Sample-rescaled.cns -o Sample-called.cns
    
  4. Export the adjusted segments back to SEG:

    cnvkit.py export seg Sample-called.cns -o Sample-called.seg
    

    Or to another format:

    cnvkit.py export bed Sample-called.cns -o Sample-called.bed
    cnvkit.py export vcf Sample-called.cns -o Sample-called.vcf
    
ADD COMMENT
0
Entering edit mode

Thanks so much Etal!

ADD REPLY

Login before adding your answer.

Traffic: 2441 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6