This is pretty much a general question in human CNV/Structural variants field (with next-gen data, NOT arrays).
As shown in 1000genome project, groups develop different algorithm-based approach to identify structural variants (mainly three algorithms: paired-end, read-depth and split-read).
However results from these approaches barely overlap with each other (of course they have different preferences, say, split-read is powerful for those small indels); and seems the false positive is quite high (or we simply don't know their false positive, because we cannot use alternative approach to validate those small structural variants like we use array CGH for large ones)
Or in simple words, I don't trust even those mainstream, or widely used approaches like Breakdancer, CNVnator (I only relatively show confidence in Pindels, because it provides nucleotide-resolution breakpoints). Do you trust them?
If not, then what should we do? To carry out some post-processing or filtering to reduce the potential false positive? For example, to adjust the read-depth threshold for read-depth-based approaches; or only limit our attention to calls supported by uniquely-mapping discordant paired-end reads for paired-end-based approaches?
Or do we need to develop our own codes for our specific research? What softwares do you guys use? (say CNVnator, Breakdancer)
Personally I would say, when someday sequencing is powerful enough to accurately produce long-enough reads, then we can say goodbye to these mapping-based methods, because we can simply assemble all reads, also in the absence of problems caused by repetitive sequences in human genome.
thx. But about dbvar, I think comparison with dbvar is based on the hypothesis that human structural variants are mostly common SVs, then we are expecting our identified SVs can be found in the database. What if most of our SVs are rare ones? Has this hypothesis been proved?