Question

Noise Dependent Copy Number Segmentation

2

Entering edit mode

14.5 years ago

Yuri ★ 1.7k

I'm dealing with quite noisy copy number data from Affymetrix arrays (100K, 500K, etc). It's obvious that resolution of areas of aberration depends on noise and signal-to-noise ratio. The noisier the data, the larger areas can be detected reliably. With the highest noise I probably can reliably get only the whole chromosome loos or gain, but it's ok.

However, many segmentation algorithms I have tried (HMM, CBS, FASeg, etc) do not estimate noise before processing the data, and I have to optimize the parameters almost for every single sample.

Do you know/have experience with an algorithm, which would automate this task? Or what is the best practices for such analysis?

cnv algorithm • 4.0k views

ADD COMMENT • link updated 13.9 years ago by Khader Shameer 18k • written 14.5 years ago by Yuri ★ 1.7k

0

Entering edit mode

Are you looking at chromosome aberration ?

ADD REPLY • link 14.5 years ago by Khader Shameer 18k

0

Entering edit mode

Yes, chromosomal deletions and amplifications.

ADD REPLY • link 14.5 years ago by Yuri ★ 1.7k

score 6 · Answer 1 · 2010-06-03

All aCGH data is pretty noisy. If it was clean, then demarcating regions of CN gain and loss would be easy and we wouldn't need complex segmentation algorithms. One of the points of These algorithms is that they average the signal across multiple probes to try to cut through that noise.

That said, unless the array prep was botched and the data is really nasty, you should be getting way more than whole-chromosome resolution from any of those platforms.

Here are some tips for using CBS through the DNAcopy package (with which I'm most familiar):

if you're following the vignette, notice that it includes a smoothing step that will help take care of some of the outliers
Be sure to set the minimum number of probes to something sensible. I'd say use three probes minimum, and if you want to be really confident, use something more like 5 or 6.

You might also consider using the NoWaves package, which removes another type of bias in the data. (I have no experience with this, but have heard good things from a colleague).

score 5 · Answer 2 · 2010-06-03

These kinds of data are inherently noisy, unfortunately. There is surprisingly little discussion of noise in any of the relevant Bioconductor documentation, except for suggested visualisations of signal/noise ratio and some brief comments on smoothing. See, e.g. crlmm, VanillaICE (HMMs for copy number estimation) and DNAcopy, for some sample workflows.

Google search for "copy number" + "noise" throws up some references that look interesting.

A Bayesian segmentation approach to ascertain copy number variations at the population level - authors claim that "Our Bayesian approach, on the other hand, identifies the exact true segments even when noise levels are high."

Joint estimation of copy number variation and reference intensities on multiple DNA arrays using GADA - motivation for the study described as "New approaches capable of jointly modeling the copy number and the non-copy number (noise) hybridization effects across multiple samples will potentially lead to more accurate results."

More discussion of noise in:

A versatile statistical analysis algorithm to detect genome copy number variation

and:

Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays - "The method described here has been developed to reduce systematic noise and precisely extract significant intensity information".

score 0 · Answer 3 · 2010-06-03

I know about Illumina based algorithms for detecting chromosome abberation. For example, CNV Partition can be used for detecting copy number variation using log R ration and B-allele frequency. I am sure there will be similar tools for Affy arrays. Depending up on the type of abberations you are looking at you need to add additional algorithms in your analysis