Hi,
Are there implementations of dna sequence alignment or motif search algorithms (or extensions of existing algorithms) that find not one but all possible alignments above a certain threshold? Ideally it would also include start and end positions of aligned shorter sequence with respect to the longer one.
What I'm looking for is an alignment-based analog of finding multiple instances of a substring within a string. It would be nice to have it in a perl module. I did a quick search myself - including Bioperl modules - but couldn't find anything.
Update: It's an amplicon library that appears to be chimeric after adapter ligation, i.e. most (primer-flanked) fragments ligated to one another by two three or more. Sequencing yielded 230 000 reads in all, average length 425 (from 6 up to 12k). Amplicon length should be 150-250 bp, much less than 425 hence my suspicions. Looking for primer sequences (about 20 bp, 10 primer pairs) as delimiters to detect (and possibly split) chimeric reads for further processing.
I wrote a perl script to find exact matches but it only found at least 1 match in 47k reads (around 3.6k have 2 or more) whereas they all should have a match. To account for PCR and sequencing errors I need to use alignment.
Similar question: Question: Given two short sequences (~1000 bp) I want to find all local alignments between them.