Hi everyone, I have many doubts about dynamic programming, in particular, I am interested about global alignment of two sequences considering both gap insertion and gap extension penalties.
I hope strongly that someone of you can help me. I am desperate.
I know that to compute the best global alignment of two sequences with dynamic programming I must follow this three steps:
- Initialization of the score and the traceback matrices
- Calculation of scores and filling in the score and traceback matrices
- Inferring the alignment from the traceback matrix
We suppose that we have these two sequences:
seq 1 = ATTCGGTGA (length 9)
seq 2 = ACCCTGG (length 7)
and gap insertion = -10, gap extension = -2.
I create a matrix 10x8 initialized with 0 (I usually call this matrix "alignment matrix") and another matrix of the same sizes that I call "directions matrix".
In the first matrix I insert in each cell the maximum value of three possibles:
- diag = S(i – 1, j – 1 ) + r(i, j) // r is char replacement score
- up = S(i – 1, j) + gap_function(row_index, col_index, arrow, current_direction, gap_ins, gap_ext)
- left = S(i, j – 1) + gap_function(row_index, col_index, arrow, current_direction, gap_ins, gap_ext)
(I know that to choose the r(i, j) I need to substitution matrix)
Gap_function is a function that decide if add gap insertion or gap extension penalty based on the arrow in the cell where current cell value (in alignment matrix) come from.
This is Python pseudo-code:
gap_function (row_index, col_index, arrow, current_direction, gap_ins, gap_ext):
return gap_ins if arrow != current_direction
gap_ext otherwise
At this link is possible see my alignment matrix: https://ibb.co/qs4JNt3
In the second matrix I simultaneously insert the direction from where value derived. For example, if the maximum value inserted in the cell (i,j) of the alignment matrix derived from diagonal, in cell (i,j) of directions matrix I insert "diag". Important! The first row of directions matrix (except at position 0) is initialized with "left" and the first column with "up" (also in this case, except at position 0), like this:
I know that in global alignment to reconstruct the aligment is used traceback, that consist on start from the last cell (bottom-right) of the directions matrix and return to the first cell (up-left).
So, I start with the first questions:
Why to reconstruct alignment is used traceback? Is not right to start from the fist cell of directions matrix? Furthermore, is compulsory to start from bottom-right cell? Why?
The last questions.
Before I wrote that alignment matrix is initialized to zeros, but maybe is more correct initialize first row and first column with gap penalities? If yes, in what way? Should I consider both gap insertion and gap extension penalties? I should initialize my alignment matrix in this way (see link please)?
Thank you so much.
How to add images to a Biostars post