I seek a utility that will perform a multiple alignment of structured DNA sequences. The sequences are a set of degenerate repeat polymorphisms. The number of repeats go from 7 to 24 subunits in the polymorphic region. The repeat units have a four-subunit structure, and each subunit appears to vary in a non-random fashion, independently of the other three subunits within a repeat unit. All told, there are well over 150 distinct repeat units that can appear in the polymorphic region, and there are fewer of each subunit, meaning that this could also be seen as a permutation/combination problem.
While these are DNA sequences, I wish to do an alignment based purely on the repeat units. I can already construct similarity matrices for the repeat units to test different hypotheses. The problem is that a manual alignment would be prohibitively tedious with the number of sequences I have.
Is it possible to determine which repeats are orthologous based on the sequence? With normal repeats this is often impossible- but you say these are degenerate?
That specific detail I can do with fairly simple pairwise comparisons of all units, partitioned into subunits. Unfortunately, that still does not solve the problem of the multiple alignment of the actual complete repeat sequences (over 30 sequences).