Question

ClustalW nucleotide alignment with a custom scoring matrix (possible bug?)

4

Entering edit mode

10.4 years ago

evanfloden ▴ 200

For the last week I have been attempting to test ClustalW to confirm that the software is indeed working correctly with a custom nucleotide scoring matrix. Consider the following as a simple test, I wish to align the two nucleotide sequences using the example given here by Wikipedia for Needleman-Wunch alignment. To do this I planned to use the profile alignment option of ClustalW which should perform the standard dynamic programming alignment between two sequences or profiles (not full multiple sequence alignment, just a single pairwise profile alignment)

This problem was previously pointed out here on BioStars however it was never addressed.

The Wikipedia Example:

Align the following two sequences: GCATGCU & GATTACA;

With the following rules: Mismatch = -1; Match= + 1; Gap = -1;

This alignment results in 3 co-optimal solutions. To test if ClustalW performs as expected I created the three files:

SeqA.fa

>SeqA
GCATGCU

SeqB.fa

>SeqB
GATTACA

matrix.txt

   A   G   C   T   U   *
A  1   -1  -1  -1  -1  -1
G  -1  1   -1  -1  -1  -1
C  -1  -1  1   -1  -1  -1
T  -1  -1  -1  1   -1  -1
U  -1  -1  -1  -1  1   -1
*  -1  -1  -1  -1  -1  -1

Now running the command:

clustalw2 -PROFILE -PROFILE1=SeqA.fa -PROFILE2=SeqB.fa -GAPOPEN=1 -GAPEXT=1 -DNAMATRIX=matrix.txt -TYPE=DNA

By examining the program via debug flags it becomes apparent that the scoring scheme being used is the IUB scheme (1.9 for match, 0 for mismatch). One can get the exact same results and scores (add -DEBUG=5 to the command) using the command:

clustalw2 -PROFILE -PROFILE1=SeqA.fa -PROFILE2=SeqB.fa -GAPOPEN=1 -GAPEXT=1 -DNAMATRIX=IUB -TYPE=DNA

or even removing the matrix completely as in:

clustalw2 -PROFILE -PROFILE1=SeqA.fa -PROFILE2=SeqB.fa -GAPOPEN=1 -GAPEXT=1 -TYPE=DNA

So it appears that ClustalW is completely ignoring a custom nucleotide matrix and not throwing any errors. I have tested multiple different ways to try get the program to score the alignment correctly but have been unable to (different formats of the substitution matrices, putting the sequences in one file, trying to perform the alignment as protein sequences etc). It is obviously reading the nucleotide matrix as it will throw an error if it does not exist or if it is in the wrong format, it just seems to ignore it completely when it actually performs the alignment.

If anyone has any suggestions in would be much appreciated as I am starting to bang my head against a wall. Also, if anyone can suggest a program which can perform pairwise profile alignments using standard dynamic programming it would also be useful for testing.

ClustalW alignment bug • 3.7k views

ADD COMMENT • link updated 3.2 years ago by Ram 45k • written 10.4 years ago by evanfloden ▴ 200

Ram · Answer 1 · 2015-03-30

2

Entering edit mode

10.3 years ago

evanfloden ▴ 200

After several hours of trial and error I discovered that there is a flag regarding transition weightings that overrides the custom matrix.

I believe that -TRANSWEIGHT=f must be set to 0 otherwise the custom matrix is ignored. Slightly obtuse use of flag in my opinion.

ADD COMMENT • link updated 3.2 years ago by Ram 45k • written 10.3 years ago by evanfloden ▴ 200