Hey,
for some project I need to run pairwise sequence alignments (local ones, i.e. Smith-Waterman) on protein sequences in order to produce a similarity matrix based on the percentage of identical aligned residues. Now, one approach could be to use blast for that: First create the blast database out of the fastafile that contains all sequences to be pairwisely compaired (using formatdb). Then, for each sequence in the file run blast against that database. But that wouldn't return me alignments against _all_ sequences but rather the ones below the given e-value. Setting the e-value to some huge number doesn't seem to be the proper solution to me (which e-value to use? Besides, when setting it too high, the returned alignments exceed the number of available sequences in the database.). Could there be another solution with blast? I'd rather stick with blast if possible before using some other not so popular aligner.
Thanks, Chris
as I was saying you in my answer, you may have the case of a protein in which one of the two has a duplicated domain; to continue your example, you could have A,B and A,A,B. In that case, the local alignment doesn't give you any useful information on the distance between proteins. In any case, I agree with darked89 in that you have to do a pre-filtering of the proteins before, otherwise your comparisons don't make sense. The distance you want to calculate is a measure of how many mutations are needed to one seq to anotehr, but if they don't have a common origin, you can't do that.