The scenario. Assume you have an accurate and staggeringly cheap sequencing technology that allows you to sequence and assemble human chromosomes to (near) completion - say under 100 large contigs. Now say you have the distinct pleasure of doing this for multiple individuals.
Let's permit another leap and assume that you have a really fantastic aligner that allows you to precisely align (pairwise) the chromosomes from two or more samples in mere minutes.
The problem. Fantastic --- now, what "grammars" exist for describing the differences between any two chromosomes from two different individuals (e.g., chr1 from Jim-Bob and Mary-Sue)? The grammars must account for SNPs, INDELs, and chromosomal rearrangements (e.g. inversions, duplications, deletions, insertions, translocations). The closest thing I know of are CIGAR strings, but they don't allow for changes in strand, duplications, etc.
Surely one must exist, but I can't find it. Any suggestions of literature to read?
Nice thought experiment but if such a thing existed how would it be used? i.e. what would its advantage be compared to using a gff or assembly style file based on a common reference chromosome.
The assumption is that there will soon be many "reference" genomes and that a newly-sequenced genome will ultimately be higher quality than the reference.
Does it matter how good the 'reference' is? If it is compared against everything surely the worst that will happen is that there will be many common 'differences' to the reference between samples.
Assume you have 100 completely assembled genomes in addition to the reference. Your lab is interested in gene X. How would you precisely compare each allele of gene X among every genome, while accounting for duplications, inversions, etc., of the gene and up/downstream sequence? Multiple alignments would keep things "registered", but it lacks a robust framework for necessary tools to do the comparisons. Or perhaps I am missing something obvious that already exists (a common occurrence, hence the question).
Agreed: copy number variants are problematic, but +1 for the cortex assembler answer below