Hello,
I am doing CNV analysis for NGS and have to compare the performance of NGS-based tools with the gold standard.
But finally I found a lot of tools that can perform aCGH data analysis.
Could you give me an insight which tool should I use for this analysis? As I understood, it is quite old technology and there is well-established methods for CNV detection using it, but I found that different tools use different approaches (HMM, a lot of segmentation algorithms, etc.). Also a visualisation will be required. Which of tools are the best?
Thank you, Etal! But does aCGH CNV detection consists only of CBS algorithm? Am I right that aCGH can require batch effect correction too? (For example, several tools suggest usage of synthetic reference for log-fold-change calculation, and sometimes it works really bad - data become overdispersed and noisy, but CNV detection using matched aCGH normal pair works better...) Does batch effect issue exists for aCGH data?
Bland-Altman plot is too similar to SNP arrays plots (for CNV detection). Should I use it on SNPs (alt-ref)? Or just plot log-fold fold DNAseq and for aCGH?
My understanding is that the aCGH kits available now are already designed by the manufacturer to hybridize evenly across the whole genome, so a synthetic reference isn't necessary. (But I could be wrong.) For your benchmarking it's best to minimize the number of variables and the design of any reference that you use is another variable that could lead to unrepresentative results.
You don't need to show an actual Bland-Alman plot in your final write-up, but I recommend reading the article and understanding why they propose this plotting approach. TL;DR: Don't report Pearson correlation, it's misleading.
But why r^2 is a misleading? You reported r^2 in your CNVKit paper, if I remember it correctly...
(I read the paper you suggested and figured out why, sorry)
Yes, I did! That was before I'd been fully persuaded otherwise (2014), and I did try to emphasize other metrics. The final publication will report the 95% limits of agreement instead of the r^2 value. While 98% correlation sounds great, it's not really surprising because the two methods are measuring the same thing; they're not separate phenomena as correlation implies.