Sequence Alignment Utility Sought
1
3
Entering edit mode
14.1 years ago

I seek a utility that will perform a multiple alignment of structured DNA sequences. The sequences are a set of degenerate repeat polymorphisms. The number of repeats go from 7 to 24 subunits in the polymorphic region. The repeat units have a four-subunit structure, and each subunit appears to vary in a non-random fashion, independently of the other three subunits within a repeat unit. All told, there are well over 150 distinct repeat units that can appear in the polymorphic region, and there are fewer of each subunit, meaning that this could also be seen as a permutation/combination problem.

While these are DNA sequences, I wish to do an alignment based purely on the repeat units. I can already construct similarity matrices for the repeat units to test different hypotheses. The problem is that a manual alignment would be prohibitively tedious with the number of sequences I have.

sequence multiple • 2.7k views
ADD COMMENT
1
Entering edit mode

Is it possible to determine which repeats are orthologous based on the sequence? With normal repeats this is often impossible- but you say these are degenerate?

ADD REPLY
0
Entering edit mode

That specific detail I can do with fairly simple pairwise comparisons of all units, partitioned into subunits. Unfortunately, that still does not solve the problem of the multiple alignment of the actual complete repeat sequences (over 30 sequences).

ADD REPLY
1
Entering edit mode
13.4 years ago
Iain ▴ 260

Hi Bryan,

In case you haven't found a solution to your problem:

if it is possible to calculate all pairwise alignments of the units that you are interested, you should be able to generate the multiple sequence alignment using the T-Coffee program.

All the pairwise alignments should be converted into a library, which T-Coffee can do. These libraries assign a weight to each of the aligned residues. The T-Coffee algorithm will calculate the multiple sequence alignment that maximizes the sum of these weights.

You might have to do some sequence manipulation to generate the initial alignments (or t-coffee might handle this natively)

  1. extract the sequence for the subunits you want to align
  2. calculate the alignment
  3. append back on the sequence removed at part a to the alignment so that in the alignment the full length sequences are the same as input sequences, with only the subunits aligned.

The code is available here.

ADD COMMENT

Login before adding your answer.

Traffic: 1440 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6