Dear all,
I can not find this information somehow (I've checked the 3rd phase of 1000GP project, they refer to tandem duplication only if it was called by DELLY and it is not a good criteria to compare, and I've checked biostars, no real answer here even if some questions were a bit similar...)
We have several types of duplications. Tandem ones and interspersed. Usually people refer to all interspersed duplications as "segmental duplcations", but for me it sounds like high allele frequency.
So the question is: how many interspersed duplications in a standard human individual genome exists, if we filter out variants with allele frequency >1%? I don't need to know the actual number; percentage in comparison with tandem ones would be fine.
(motivation for this question: most of the SV calling tools detect tandem duplications, but not interspersed ones, and usually variants with >1% mAF are considered as not important - how much I will miss per a human genome if I just don't call interspersed duplications?)
to be honest , I've never heard of calling interspersed duplications as 'segmental duplications'. Just as the name says segmental duplication are duplications of complete segments of the genome (not just a single or few genes). After time these might look like interspersed duplications because many of the duplicate genes in that segment will be 'removed' over time and only a few recognizable ones are kept.
tandems and interspersed duplicated genes are the result of small scale duplications ( a continuous process in any genome ), segmental duplications are large scale (only happen, or at least are 'fixed', once so often in the evolution of a genome)
yeap, sometimes they do =) https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz237/5425335 The problem in SV calling is there is no strict terminology. These guys use the different ideas behind "interspersed segmental duplications". They use the word "segment" to denote the segment, without an assumption on the amount of genes inside this segment...
UPD: I was wrong, they used Chaisson definition of SD https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4745987/ , they are actually assumed as large, whatever it means