needleman-wunsch algorithm on python
2
2
Entering edit mode
7.9 years ago
elisheva ▴ 120

hello everyone.

I have to execute the needleman-wunsch algorithm on python for global sequence alignment.

I have to fill 1 matrix withe all the values according to the penalty of match, mismatch, and gap.

And another matrix as pointers matrix - where "v" for vertical, "H" for horizontal and "D" for diagonal.

for example: for the sequences: TCGCA and TCCA, match:1, mismatch:-1 and gap:-2

the output should be:

           T   C   G   C   A
   [[  0  -2  -4  -6  -8 -10]
T  [ -2   1  -1  -3  -5  -7]
C  [ -4  -1   2   0  -2  -4]
C  [ -6  -3   0   1   1  -1]
A  [ -8  -5  -2  -1   0   2]]

         T   C   G   C   A
  [['0' 'H' 'H' 'H' 'H' 'H']
T ['V' 'D' 'H' 'H' 'H' 'H']
C ['V' 'V' 'D' 'H' 'D' 'H']
C ['V' 'V' 'D' 'D' 'D' 'H']
A ['V' 'V' 'V' 'D' 'D' 'D']]

And my program output is:

           T   C   G   C   A
   [[  0  -2  -4  -6  -8 -10]
T  [ -2   1  -1  -3  -5  -7]
C  [ -4  -1   2   0  -2  -4]
C  [ -6  -3   0   1   1  -1]
A  [ -8  -5  -2  -1   0   2]]

         T   C   G   C   A
  [['0' 'H' 'H' 'H' 'H' 'H']
T ['V' 'D' 'V' 'V' 'V' 'V']
C ['V' 'H' 'D' 'V' 'D' 'V']
C ['V' 'H' 'D' 'D' 'D' 'V']
A ['V' 'H' 'H' 'D' 'D' 'D']]

This is my code:

import numpy as np
from string import *

#-------------------------------------------------------
#This function returns to values for cae of match or mismatch
def Diagonal(n1,n2,pt):
    if(n1 == n2):
        return pt['MATCH']
    else:
        return pt['MISMATCH']

#------------------------------------------------------------   
#This function gets the optional elements of the aligment matrix and returns the elements for the pointers matrix.
def Pointers(di,ho,ve):

    pointer = max(di,ho,ve) #based on python default maximum(return the first element).

    if(di == pointer):
        return 'D'
    elif(ho == pointer):
        return 'H'
    else:
         return 'V'    

#--------------------------------------------------------
#This function creates the aligment and pointers matrices
def NW(s1,s2,match = 1,mismatch = -1, gap = -2):
    penalty = {'MATCH': match, 'MISMATCH': mismatch, 'GAP': gap} #A dictionary for all the penalty valuse.
    n = len(s1) + 1 #The dimension of the matrix columns.
    m = len(s2) + 1 #The dimension of the matrix rows.
    al_mat = np.zeros((m,n),dtype = int) #Initializes the alighment matrix with zeros.
    p_mat = np.zeros((m,n),dtype = str) #Initializes the alighment matrix with zeros.
    #Scans all the first rows element in the matrix and fill it with "gap penalty"
    for i in range(m):
        al_mat[i][0] = penalty['GAP'] * i
        p_mat[i][0] = 'V'
    #Scans all the first columns element in the matrix and fill it with "gap penalty"
    for j in range (n):
        al_mat[0][j] = penalty['GAP'] * j
        p_mat [0][j] = 'H'
    #Fill the matrix with the correct values.

    p_mat [0][0] = 0 #Return the first element of the pointer matrix back to 0.
    for i in range(1,m):
        for j in range(1,n):
            di = al_mat[i-1][j-1] + Diagonal(s1[j-1],s2[i-1],penalty) #The value for match/mismatch -  diagonal.
            ho = al_mat[i-1][j] + penalty['GAP'] #The value for gap - horizontal.(from the left cell)
            ve = al_mat[i][j-1] + penalty['GAP'] #The value for gap - vertical.(from the upper cell)
            al_mat[i][j] = max(di,ho,ve) #Fill the matrix with the maximal value.(based on the python default maximum)
            p_mat[i][j] = Pointers(di,ho,ve)
    print np.matrix(al_mat) 
    print np.matrix(p_mat)

Can anybody find what the problem is?

Thanks!!

python • 31k views
ADD COMMENT
6
Entering edit mode
7.9 years ago

There are a couple problems, not least of which that what you think your program should output is wrong. Below is the corrected program:

def NW(s1,s2,match = 1,mismatch = -1, gap = -2):
    penalty = {'MATCH': match, 'MISMATCH': mismatch, 'GAP': gap} #A dictionary for all the penalty valuse.
    n = len(s1) + 1 #The dimension of the matrix columns.
    m = len(s2) + 1 #The dimension of the matrix rows.
    al_mat = np.zeros((m,n),dtype = int) #Initializes the alighment matrix with zeros.
    p_mat = np.zeros((m,n),dtype = str) #Initializes the alighment matrix with zeros.
    #Scans all the first rows element in the matrix and fill it with "gap penalty"
    for i in range(m):
        al_mat[i][0] = penalty['GAP'] * i
        p_mat[i][0] = 'V'
    #Scans all the first columns element in the matrix and fill it with "gap penalty"
    for j in range (n):
        al_mat[0][j] = penalty['GAP'] * j
        p_mat [0][j] = 'H'
    #Fill the matrix with the correct values.

    p_mat [0][0] = 0 #Return the first element of the pointer matrix back to 0.
    for i in range(1,m):
        for j in range(1,n):
            di = al_mat[i-1][j-1] + Diagonal(s1[j-1],s2[i-1],penalty) #The value for match/mismatch -  diagonal.
            ho = al_mat[i][j-1] + penalty['GAP'] #The value for gap - horizontal.(from the left cell)
            ve = al_mat[i-1][j] + penalty['GAP'] #The value for gap - vertical.(from the upper cell)
            al_mat[i][j] = max(di,ho,ve) #Fill the matrix with the maximal value.(based on the python default maximum)
            p_mat[i][j] = Pointers(di,ho,ve)
    print np.matrix(al_mat)
    print np.matrix(p_mat)
  1. ho and ve were swapped
  2. di and so on were using the updated penalties if a given path were followed. Instead, you want to go to the greater of the neighboring points.
ADD COMMENT
0
Entering edit mode

...what you think your program should output is wrong.

Ooops, I didn't consider that :)

ADD REPLY
0
Entering edit mode

I didn't understand why my program should output is wrong. Isn't mismatch better than gap? And also according to ncbi that's realy the right answer:

Query  1  TCGCA  5
          || ||
Sbjct  1  TC-CA  4
ADD REPLY
0
Entering edit mode

In and of itself, yes, but the neighboring scores represent the score of the best-possible path leading to that point...

ADD REPLY
0
Entering edit mode

But what you'v changed does'nt work on this case. its output is:

[[  0  -2  -4  -6  -8 -10]
 [ -2   1  -1  -3  -5  -7]
 [ -4  -1   2   0  -2  -4]
 [ -6  -3   0   1   1  -1]
 [ -8  -5  -2  -1   0   2]]

[['0' 'H' 'H' 'H' 'H' 'H']
 ['V' 'D' 'H' 'H' 'H' 'H']
 ['V' 'V' 'D' 'H' 'H' 'H']
 ['V' 'V' 'V' 'D' 'H' 'H']
 ['V' 'V' 'V' 'V' 'D' 'D']]

And as I wrote on top the ncbi indeed gives my expected result:

Query  1  TCGCA  5
          || ||
Sbjct  1  TC-CA  4
ADD REPLY
0
Entering edit mode

Oh, wait a second. It looks to me like your initial output was already correct and it's just your expected output that's wrong.

ADD REPLY
0
Entering edit mode

No, my output is wrong. 'v' means up and 'H' left

ADD REPLY
0
Entering edit mode

Yep, that's correct. So:

         T   C   G   C   A
  [['0' 'H' 'H' 'H' 'H' 'H']
T ['V' 'D' 'V' 'V' 'V' 'V']
C ['V' 'H' 'D' 'V' 'D' 'V']
C ['V' 'H' 'D' 'D' 'D' 'V']
A ['V' 'H' 'H' 'D' 'D' 'D']]

That looks correct to me. A path to get to the bottom right could horizontally along the top for a while (through Hs) then down for a while (through Vs) before it gets to a D and finishes along the diagonal. That's what it's supposed to look like.

ADD REPLY
0
Entering edit mode

Indeed, ignore my point 2, it was just the swapping of ve and ho. That's what I get for not checking.

I've corrected the code in the answer.

ADD REPLY
0
Entering edit mode
7.9 years ago

Looks like you can change Pointers like this:

if(di == pointer):
    return 'D'
elif(ho == pointer):
    return 'V'
else:
     return 'H'
ADD COMMENT
0
Entering edit mode

It does'nt explain the problem

ADD REPLY

Login before adding your answer.

Traffic: 1881 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6