I just came across a paper from last year in which is argued that reference genomes need to include common structural variations to better enable us to map sequencing data to them. And therefore to understand (smaller?) less common variation from resequencing data in the right context ( in personal genomics, but also model organism).
The reference object would need to change for this from a set of linear objects to a set of graphs, one for each chromosome. And mappers need to be able work with these new references. Mapping data would result in a path trough this graph, which would identify the broader population group(s) (ancestry) of your sample. Annotation of the reference genome also needs to somehow work with multiple paths to the graph instead of one path (scale) on a linear object.
This got me interested in the current status the inclusion of common structural variation in references and the software and data ecosystem around it. Are there any such references / mappers yet? Will all common SNP / Indel / CNV/ SV data move into the reference ?
The paper is from the Genome Reference Consortium(GRC) (EBI, Sanger, NCBI etc.) http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001091
Thanks for the very interesting reference.