I'm doing copy number variation analysis using next-generation sequencing right now. I need kind of standard CNV(say, for NA12878) as control, to see if my own algorithm works or not.
It seems from 1000 genome project, different groups have identified CNVs for human genome using distinctive algorithms. See paper below:
I read the paper and also attached supplementary table. The table lists all CNV calls by different groups. But I don't see a standard CNV lists based on the previous work using these various algorithms.
So is there such a standard? Or I could try to make one standard out of the previous work, using tools like bedtools?
There is not such a standard (that I am aware of) and if there is, I'd love to know. This is likely because there are (last I counted) 19 CNV calling algorithms for NextGen sequence data. Their release set sounds like the agreement of joint calling.
I too think this is an interesting topic. Whole genome sequencing is very good--like CNV calling algorithms--at calling homozygous deletions. But for items that are copy number 1 or (even worse) 3+ they do quite poorly. This presentation from last year has info on CNV calling in 1KG. The paper you cite gives the major algorithms used. Since these perform so poorly for duplications, it might be in your interest to get the Illumina or Affy data files and call them for a trio or just find a way to use their supplementary table 5 gold standard calls from earlier work and compare them to your results when you run existing algorithms and your own against the same trio. That would make a great paper.
It looks like supplementary table 10 gives the methods (e.g. split read, etc.) but it doesn't say exactly which algorithm.
Rx
Twitter: @delahar
ADD COMMENT
• link
updated 23 months ago by
Ram
44k
•
written 13.2 years ago by
Ryan D
★
3.4k
Hi everyone
After few years, I'd like to know if there are any update for the mentioned question.
I'm trying some recent CNV tools, but I'm not sure about the specificity of them.
Thanks in advance