NUMBER OF POSSIBLE PAIRWISE ALIGNMENTS:
The number of pairwise alignments increases factorially with the lengths of the sequences. If you consider Non-Boring Alignments(NBA) (where gaps are always paired with nucleotides) the number of all possible alignments of two sequences of lengths m
and n
is (m+n)!/m!*n!
, so the formular for two sequences of equal length is so (2n)!/(n!)^2
which is often approximated (for practical reasons) to 2^2n/(π*n)^-0.5
[Lange (2002), Eddy, Nature Biotechnology (2004)].
For some small sequences' lengths, the number of possible alignments is already too big; for example fot two sequences of 11 nucleotides, there are 705,432 possible alignments, and for two sequences of 100 nucleotides, the number is astronomical (> 10^60).
NUMBER OF POSSIBLE MULTIPLE SEQUENCE ALIGNMENTS:
As for the number of possible alignments for more than two sequences, you can find the formula in Slowinski, Mol. Phylogenet. Evol (1998). Accordingly, there are 16,081 different alignments for 3 sequences of 3 characters each and 1.05*10^18 different alignments for 5 sequences of length of 5 nucleotides.
As a result, the calculation of an exact alignment becomes infeasible for relatively small sets of comparatively short sequences. For example, the dynamic programming algorithm implemented in the MSA program (Lipman et al., 1989) allows optimal alignment of up to roughly 10 protein sequences of small lengths of 200-300 residues.
That's wrong. According to your formula, two length-2 strings have 6 possible alignments. Actually, they could align in any of these ways (where M=match/mismatch, C=clip, D=deletion, I=insertion):
This assumes the sequences overlap and no operations are allowed for characters not present in some input string; otherwise, there would be infinite possible alignments.
The number of possible pairwise alignments is a Delannoy number and is equivalent to traversing a grid of size m by n from the bottom-left corner to the top-right corner where horizontal, vertical, and diagonal steps are allowed (the traceback in the Needleman-Wunsch algorithm).
How do you define clip?