hello everyone.
I have to execute the needleman-wunsch algorithm on python for global sequence alignment.
I have to fill 1 matrix withe all the values according to the penalty of match, mismatch, and gap.
And another matrix as pointers matrix - where "v" for vertical, "H" for horizontal and "D" for diagonal.
for example: for the sequences: TCGCA and TCCA, match:1, mismatch:-1 and gap:-2
the output should be:
T C G C A
[[ 0 -2 -4 -6 -8 -10]
T [ -2 1 -1 -3 -5 -7]
C [ -4 -1 2 0 -2 -4]
C [ -6 -3 0 1 1 -1]
A [ -8 -5 -2 -1 0 2]]
T C G C A
[['0' 'H' 'H' 'H' 'H' 'H']
T ['V' 'D' 'H' 'H' 'H' 'H']
C ['V' 'V' 'D' 'H' 'D' 'H']
C ['V' 'V' 'D' 'D' 'D' 'H']
A ['V' 'V' 'V' 'D' 'D' 'D']]
And my program output is:
T C G C A
[[ 0 -2 -4 -6 -8 -10]
T [ -2 1 -1 -3 -5 -7]
C [ -4 -1 2 0 -2 -4]
C [ -6 -3 0 1 1 -1]
A [ -8 -5 -2 -1 0 2]]
T C G C A
[['0' 'H' 'H' 'H' 'H' 'H']
T ['V' 'D' 'V' 'V' 'V' 'V']
C ['V' 'H' 'D' 'V' 'D' 'V']
C ['V' 'H' 'D' 'D' 'D' 'V']
A ['V' 'H' 'H' 'D' 'D' 'D']]
This is my code:
import numpy as np
from string import *
#-------------------------------------------------------
#This function returns to values for cae of match or mismatch
def Diagonal(n1,n2,pt):
if(n1 == n2):
return pt['MATCH']
else:
return pt['MISMATCH']
#------------------------------------------------------------
#This function gets the optional elements of the aligment matrix and returns the elements for the pointers matrix.
def Pointers(di,ho,ve):
pointer = max(di,ho,ve) #based on python default maximum(return the first element).
if(di == pointer):
return 'D'
elif(ho == pointer):
return 'H'
else:
return 'V'
#--------------------------------------------------------
#This function creates the aligment and pointers matrices
def NW(s1,s2,match = 1,mismatch = -1, gap = -2):
penalty = {'MATCH': match, 'MISMATCH': mismatch, 'GAP': gap} #A dictionary for all the penalty valuse.
n = len(s1) + 1 #The dimension of the matrix columns.
m = len(s2) + 1 #The dimension of the matrix rows.
al_mat = np.zeros((m,n),dtype = int) #Initializes the alighment matrix with zeros.
p_mat = np.zeros((m,n),dtype = str) #Initializes the alighment matrix with zeros.
#Scans all the first rows element in the matrix and fill it with "gap penalty"
for i in range(m):
al_mat[i][0] = penalty['GAP'] * i
p_mat[i][0] = 'V'
#Scans all the first columns element in the matrix and fill it with "gap penalty"
for j in range (n):
al_mat[0][j] = penalty['GAP'] * j
p_mat [0][j] = 'H'
#Fill the matrix with the correct values.
p_mat [0][0] = 0 #Return the first element of the pointer matrix back to 0.
for i in range(1,m):
for j in range(1,n):
di = al_mat[i-1][j-1] + Diagonal(s1[j-1],s2[i-1],penalty) #The value for match/mismatch - diagonal.
ho = al_mat[i-1][j] + penalty['GAP'] #The value for gap - horizontal.(from the left cell)
ve = al_mat[i][j-1] + penalty['GAP'] #The value for gap - vertical.(from the upper cell)
al_mat[i][j] = max(di,ho,ve) #Fill the matrix with the maximal value.(based on the python default maximum)
p_mat[i][j] = Pointers(di,ho,ve)
print np.matrix(al_mat)
print np.matrix(p_mat)
Can anybody find what the problem is?
Thanks!!
Ooops, I didn't consider that :)
I didn't understand why my program should output is wrong. Isn't mismatch better than gap? And also according to ncbi that's realy the right answer:
In and of itself, yes, but the neighboring scores represent the score of the best-possible path leading to that point...
But what you'v changed does'nt work on this case. its output is:
And as I wrote on top the ncbi indeed gives my expected result:
Oh, wait a second. It looks to me like your initial output was already correct and it's just your expected output that's wrong.
No, my output is wrong. 'v' means up and 'H' left
Yep, that's correct. So:
That looks correct to me. A path to get to the bottom right could horizontally along the top for a while (through Hs) then down for a while (through Vs) before it gets to a D and finishes along the diagonal. That's what it's supposed to look like.
Indeed, ignore my point 2, it was just the swapping of
ve
andho
. That's what I get for not checking.I've corrected the code in the answer.