I have a set of 520 influenza sequences for which I have already done multiple sequence alignment, and computed the pairwise identity matrix. If I'd like to add in another sequence, I have to re-align everything, and recompute the entire PWI matrix. Is there any program I can use to "append" this other sequence to the alignment, and only compute the PWI w.r.t. every other sequence?
A simple example would be as follows. I have a 2x2 alignment, with the following scores.
SeqA SeqB
SeqA 1.00 0.98
SeqB 0.98 1.00
Without re-running a full alignment, but only running "SeqC" against all the other sequences, I'd like to get the following matrix:
SeqA SeqB SeqC
SeqA 1.00 0.98 0.99
SeqB 0.98 1.00 0.97
SeqC 0.99 0.97 1.00
I am using the BioPython package, and Python is my preferred language, but I'm okay with Java if need be too.
[I'll disclaim here that I'm cross-posting from StackOverflow, just in case there's experts here that aren't on SO.]
I don't know about PWI but when computing distances, all informative positions are counted from each column. Adding or removing a sequence would affect the counts, and thus the distances will change.