Program to merge two sequences by the longest path
0
0
Entering edit mode
7.6 years ago
dmathog ▴ 40

Has anyone ever encountered a program which aligns two sequences and then merges them by following the longest path?

That is, when it comes to an indel, it always uses the insertion rather than the deletion.

This would be used more or less like EMBOSS megamerger, except that program works by crossing over between designated "front" and "back" sequences, and this one needs to follow a more complex path through the alignment array.

I can see ways to edit the output of something like "ggsearch" into the desired result, ie:

             300       310       320       330       340       350
seq1  agagagagagaaagaaagaaagaaagaagagaaagacacatatagatatatagagagaga
      ::::::::::::::::::::::::    :::::::::::::::::::::::::: :::::
seq2  AGAGAGAGAGAAAGAAAGAAAGAA----GAGAAAGACACATATAGATATATAGATAGAGA
             430       440           450       460       470      

becomes

seq12 AGAGAGAGAGAAAGAAAGAAAGAAagaaGAGAAAGACACATATAGATATATAGATAGAGA

but was hoping for a prepackaged solution.

Thanks.

alignment • 1.0k views
ADD COMMENT
0
Entering edit mode

how about writing a script to parsing the output, to compare every bases of every locations in both sequences.

ADD REPLY
0
Entering edit mode

That's pretty much what I ended up doing, and wrote it in the general form for an MSA rather than just a pair of sequences. The algorithm is this:

  A. load sequences from MSA in fasta format and sanity check.
  B. for each sequence:
     Pass 1.  find minimum distance from each position to the nearest gap 
         or sequence end. 0 for positions which are gaps.
     Pass 2.  for each run of nonzero distances replace all values with the
         maximum distance. (1 2 3 2 1 becomes 3 3 3 3 3)
  C. over all sequence positions, find the sequence with the largest 
     distance value and emit that base.
ADD REPLY

Login before adding your answer.

Traffic: 1841 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6