Finding all possible alignments of two sequences
1
0
Entering edit mode
6.6 years ago
prishly ▴ 10

Hi,

Are there implementations of dna sequence alignment or motif search algorithms (or extensions of existing algorithms) that find not one but all possible alignments above a certain threshold? Ideally it would also include start and end positions of aligned shorter sequence with respect to the longer one.

What I'm looking for is an alignment-based analog of finding multiple instances of a substring within a string. It would be nice to have it in a perl module. I did a quick search myself - including Bioperl modules - but couldn't find anything.

Update: It's an amplicon library that appears to be chimeric after adapter ligation, i.e. most (primer-flanked) fragments ligated to one another by two three or more. Sequencing yielded 230 000 reads in all, average length 425 (from 6 up to 12k). Amplicon length should be 150-250 bp, much less than 425 hence my suspicions. Looking for primer sequences (about 20 bp, 10 primer pairs) as delimiters to detect (and possibly split) chimeric reads for further processing.

I wrote a perl script to find exact matches but it only found at least 1 match in 47k reads (around 3.6k have 2 or more) whereas they all should have a match. To account for PCR and sequencing errors I need to use alignment.

alignment dna • 1.8k views
ADD COMMENT
1
Entering edit mode
6.6 years ago

It would help if you gave an idea of the size and number of your query and reference sequences. Do you want to do this for tens, or millions of sequences? Are long are they? Tens of bases or full genomes? I'm asking because some options may or may not be feasible depending on the scale of your problem.

Anyway, have a look at vmatch. In particular, section 9.9.2 of the manual (Matching Queries against an Index - Computing Substring Matches) has a use case that may suit you. This is a stand-alone program though not a module.

ADD COMMENT
0
Entering edit mode

Thank you. I've included some new info in my post.

ADD REPLY

Login before adding your answer.

Traffic: 1763 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6