I have lots of ~250bp sequences & need to do local alignment. Tried EMBOSS water & matcher - they are good, but give just the best score. And I need something similar to BLAST2 where all the possible alignments are given. Hope someone could help me.
I have lots of ~250bp sequences & need to do local alignment. Tried EMBOSS water & matcher - they are good, but give just the best score. And I need something similar to BLAST2 where all the possible alignments are given. Hope someone could help me.
If water which performs full smith-waterman is efficient enough to map your number of sequences try ssearch which does exactly this. It comes with the fasta tools, has SSE support and multiple thread support such that it could be even faster. It will give you all there is to find (up to an evalue of 10 by default) which will also let you see a lot of bad alignments.
lalign36 does exactly what you want. It will show you all the non-overlapping alignments of a pair of sequences.
ssearch36 does something slightly different (and perhaps more like blast2seq); it will show you all of the parts of the target sequence that align with the query, but once part of the target is aligned, it will not be aligned again.
For example, in the sequence X A A B B Y vs Z A B A B V, both lalign36 and ssearch36 should show:
x a A B b y (here capital letters indicate actual alignment, z A B a b v lower case indicate context and are not aligned) x a A B b y z a b A B v
But lalign36 would also show:
x A a b b y z A b a b v x a a b B y z a b a B v x a a b B y z a B a b v x A a b b y z a b A b v
Have you tested this? Because the documentation states the contrary: "lalign36 - Calculate multiple, non-intersecting alignments using the sim2 implementation of the Waterman-Eggert algorithm [21] developed by Xiaoqui Huang and Web Miller [7]. Statistical estimates are calculated from Smith-Waterman scores of shuffled sequences." This seems to contradict your example.
actually I wanted to test a simple example, while this worked fine for ssearch and fasta i didn't get any alingment with lsearch. I used a target sequence containing a repeat: xyzabbaabbaxyz and query: abba. If i understand correctly that should yield 2 loc.-alignments (equal score). with fasta, ssearch and glsearch I got both 2, but with lalign I got 0 alignments. That might be a bug in lalign, but still there would be no difference, so your first point about ssearch36 is void, at best there is no difference. I don't understand how an algorithm can be more 'exhaustive' than exact one.
I just noted you are the first author of the FASTA tools, if that is the case pls excuse my ignorance! I still don't get two things though: 1. how come lalign36 (36.3.5c Dec, 2011(preload8)) shows 0 hits? 2. if i get your example correctly, lsearch is supposed to yield less-than optimal scoring alingments for a segment? If so, I don't get how this is relevant for the application of the original post?
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
why not blast2 or megablast then? Too slow? where did you get the sequences from, how many sequences (reads?)?, what is the reference? It is important to know how many sequences there are because that determines the tradeoff between sensitivity and run-time. Out of the blue, try ssearch36 (in fast utils), then if that is too slow try something else, e.g fasta, megablast, blat.
look here: http://fasta.bioch.virginia.edu/fasta_www2/fasta_list2.shtml
~200 sequences and it's not the end. The main idea is to align all the sequences against each other (200*200). And I really want to make all those alignments as automatic as possible.