Question

aCGH CNV detection

2

Entering edit mode

9.5 years ago

Chico Fernandez ▴ 80

Hello,

I am doing CNV analysis for NGS and have to compare the performance of NGS-based tools with the gold standard.

But finally I found a lot of tools that can perform aCGH data analysis.

Could you give me an insight which tool should I use for this analysis? As I understood, it is quite old technology and there is well-established methods for CNV detection using it, but I found that different tools use different approaches (HMM, a lot of segmentation algorithms, etc.). Also a visualisation will be required. Which of tools are the best?

aCGH CNV • 2.7k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 9.5 years ago by Chico Fernandez ▴ 80

Ram · Answer 1 · 2016-02-03

2

Entering edit mode

9.5 years ago

Eric T. ★ 2.9k

The CBS algorithm is fairly well established for array CGH. You can use it through the Broad's GenePattern server or the R/Bioconductor package DNAcopy.

Also consider a Bland-Altman plot (or the reasoning behind it) for evaluating the agreement between aCGH and DNAseq, as aCGH isn't perfect either.

ADD COMMENT • link 9.5 years ago by Eric T. ★ 2.9k

0

Entering edit mode

Thank you, Etal! But does aCGH CNV detection consists only of CBS algorithm? Am I right that aCGH can require batch effect correction too? (For example, several tools suggest usage of synthetic reference for log-fold-change calculation, and sometimes it works really bad - data become overdispersed and noisy, but CNV detection using matched aCGH normal pair works better...) Does batch effect issue exists for aCGH data?

Bland-Altman plot is too similar to SNP arrays plots (for CNV detection). Should I use it on SNPs (alt-ref)? Or just plot log-fold fold DNAseq and for aCGH?

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 9.5 years ago by Chico Fernandez ▴ 80

1

Entering edit mode

My understanding is that the aCGH kits available now are already designed by the manufacturer to hybridize evenly across the whole genome, so a synthetic reference isn't necessary. (But I could be wrong.) For your benchmarking it's best to minimize the number of variables and the design of any reference that you use is another variable that could lead to unrepresentative results.

You don't need to show an actual Bland-Alman plot in your final write-up, but I recommend reading the article and understanding why they propose this plotting approach. TL;DR: Don't report Pearson correlation, it's misleading.

ADD REPLY • link 9.5 years ago by Eric T. ★ 2.9k

0

Entering edit mode

But why r^2 is a misleading? You reported r^2 in your CNVKit paper, if I remember it correctly...

(I read the paper you suggested and figured out why, sorry)

ADD REPLY • link 9.5 years ago by Chico Fernandez ▴ 80

1

Entering edit mode

Yes, I did! That was before I'd been fully persuaded otherwise (2014), and I did try to emphasize other metrics. The final publication will report the 95% limits of agreement instead of the r^2 value. While 98% correlation sounds great, it's not really surprising because the two methods are measuring the same thing; they're not separate phenomena as correlation implies.

ADD REPLY • link 9.5 years ago by Eric T. ★ 2.9k