Question

Finding all possible alignments of two sequences

0

Entering edit mode

7.4 years ago

prishly ▴ 10

Hi,

Are there implementations of dna sequence alignment or motif search algorithms (or extensions of existing algorithms) that find not one but all possible alignments above a certain threshold? Ideally it would also include start and end positions of aligned shorter sequence with respect to the longer one.

What I'm looking for is an alignment-based analog of finding multiple instances of a substring within a string. It would be nice to have it in a perl module. I did a quick search myself - including Bioperl modules - but couldn't find anything.

Update: It's an amplicon library that appears to be chimeric after adapter ligation, i.e. most (primer-flanked) fragments ligated to one another by two three or more. Sequencing yielded 230 000 reads in all, average length 425 (from 6 up to 12k). Amplicon length should be 150-250 bp, much less than 425 hence my suspicions. Looking for primer sequences (about 20 bp, 10 primer pairs) as delimiters to detect (and possibly split) chimeric reads for further processing.

I wrote a perl script to find exact matches but it only found at least 1 match in 47k reads (around 3.6k have 2 or more) whereas they all should have a match. To account for PCR and sequencing errors I need to use alignment.

alignment dna • 2.0k views

ADD COMMENT • link 7.4 years ago by prishly ▴ 10

2

Entering edit mode

Similar question: Question: Given two short sequences (~1000 bp) I want to find all local alignments between them.

ADD REPLY • link 7.4 years ago by Andrzej Zielezinski 11k

score 1 · Answer 1 · 2018-03-14

1

Entering edit mode

7.4 years ago

dariober 15k

It would help if you gave an idea of the size and number of your query and reference sequences. Do you want to do this for tens, or millions of sequences? Are long are they? Tens of bases or full genomes? I'm asking because some options may or may not be feasible depending on the scale of your problem.

Anyway, have a look at vmatch. In particular, section 9.9.2 of the manual (Matching Queries against an Index - Computing Substring Matches) has a use case that may suit you. This is a stand-alone program though not a module.