For the last week I have been attempting to test ClustalW to confirm that the software is indeed working correctly with a custom nucleotide scoring matrix. Consider the following as a simple test, I wish to align the two nucleotide sequences using the example given here by Wikipedia for Needleman-Wunch alignment. To do this I planned to use the profile alignment option of ClustalW which should perform the standard dynamic programming alignment between two sequences or profiles (not full multiple sequence alignment, just a single pairwise profile alignment)
This problem was previously pointed out here on BioStars however it was never addressed.
The Wikipedia Example:
Align the following two sequences: GCATGCU & GATTACA;
With the following rules: Mismatch = -1; Match= + 1; Gap = -1;
This alignment results in 3 co-optimal solutions. To test if ClustalW performs as expected I created the three files:
SeqA.fa
>SeqA
GCATGCU
SeqB.fa
>SeqB
GATTACA
matrix.txt
A G C T U *
A 1 -1 -1 -1 -1 -1
G -1 1 -1 -1 -1 -1
C -1 -1 1 -1 -1 -1
T -1 -1 -1 1 -1 -1
U -1 -1 -1 -1 1 -1
* -1 -1 -1 -1 -1 -1
Now running the command:
clustalw2 -PROFILE -PROFILE1=SeqA.fa -PROFILE2=SeqB.fa -GAPOPEN=1 -GAPEXT=1 -DNAMATRIX=matrix.txt -TYPE=DNA
By examining the program via debug flags it becomes apparent that the scoring scheme being used is the IUB scheme (1.9 for match, 0 for mismatch). One can get the exact same results and scores (add -DEBUG=5 to the command) using the command:
clustalw2 -PROFILE -PROFILE1=SeqA.fa -PROFILE2=SeqB.fa -GAPOPEN=1 -GAPEXT=1 -DNAMATRIX=IUB -TYPE=DNA
or even removing the matrix completely as in:
clustalw2 -PROFILE -PROFILE1=SeqA.fa -PROFILE2=SeqB.fa -GAPOPEN=1 -GAPEXT=1 -TYPE=DNA
So it appears that ClustalW is completely ignoring a custom nucleotide matrix and not throwing any errors. I have tested multiple different ways to try get the program to score the alignment correctly but have been unable to (different formats of the substitution matrices, putting the sequences in one file, trying to perform the alignment as protein sequences etc). It is obviously reading the nucleotide matrix as it will throw an error if it does not exist or if it is in the wrong format, it just seems to ignore it completely when it actually performs the alignment.
If anyone has any suggestions in would be much appreciated as I am starting to bang my head against a wall. Also, if anyone can suggest a program which can perform pairwise profile alignments using standard dynamic programming it would also be useful for testing.