I'm writing code for a pair HMM based on the information provided in Durbin et al's Biological Sequence Analysis (10th reprint). I understand the global alignment model described in chapter 4, but am having some trouble with the local alignment model. Specifically, I'm not sure how to initialize the silent states for the forward algorithm.
I have the following for recursion, where S1-4 are the silent states (in order) in the flanking random models for figure 4.3:
f_S1(i, j) = [n * ( f_B(i, j) + f_RX1(i, j) )]
f_S2(i, j) = [n * ( f_S1(i, j) + f_RY1(i, j) )]
f_S3(i, j) = [t * ( f_S2(i, j) + f_M(i, j) + f_X(i, j) + f_Y(i, j) )]
f_S4(i, j) = [n * ( f_S3(i, j) + f_RX2(i, j) )]
For non-silent states:
f_RX1(i, j) = qxi * [(1-n) * (f_B(i-1, j) + f_RX1(i-1, j) )]
f_RY1(i, j) = qyj * [(1-n) * (f_S1(i, j-1) + f_RY1(i, j-1) )]
f_M(i, j) = pxiyj * [( (1-2d-t) * (f_S2(i-1, j-1) + f_M(i-1, j-1) ))+ ( (1-e-t) * (f_X(i-1, j-1) + f_Y(i-1, j-1) ))]
f_X(i, j) = qxi * [( d * (f_S2(i-1, j) + f_M(i-1, j) )) + (e * f_X(i-1, j) )]
f_Y(i, j) = qyj * [( d * (f_S2(i, j-1) + f_M(i, j-1) )) + (e * f_Y(i, j-1) )]
f_RX2(i, j) = qxi * [(1-n) * (f_S3(i-1, j) + f_RX2(i-1, j) )]
f_RY2(i, j) = qyj * [(1-n) * (f_S4(i, j-1) + f_RY2(i, j-1) )]
Where n, t, d, e
are transition probabilities and pxiyj, qxi, qyj
are emission probabilities.
This is based on the tidbit of info provided on silent states near the end of chapter 3.4, and forward algorithm for the global model described in chapter 4.2
I have everything initialized to 0, except for f_B(0, 0) = 1
so the model starts in the begin state. But the f scores don't update because nothing points to f_B(0, 0)
and everything gets multiplied by 0. Should the silent states be initialized differently, or should the recursion be done differently?
Thanks
I am not sure I understand the question. The silent states are states that don't emit any symbol so their emission probabilities are 0. However, you still need to assign them transition probabilities to other states.
Sorry, I'm wondering how to implement the forward algorithm in the local alignment pair HMM described by Durbin et al, which includes four silent states. The book excplicitly describes the forward algorithm for the global alignment pair HMM, but not how to make changes to include the silent states and random sequence flanking models for the local alignment pair HMM.
If you enter a silent state, there's no emission so you need to get rid of the emission probability terms and since no symbol is emitted, the sequence index is not incremented, i.e. when looking at the traditional dynamic programming table, for a silent state, you look for the non-silent states in the same column as the table cell you're computing. The modification is mentioned on p71 of the Durbin book.
That was my understanding, and that's how I've coded my silent states. I've added more info to my question to hopefully show this. I imagine I am initializing the forward algorithm incorrectly then, because the f scores remain at 0 and the sum probability is 0. How do you initialize when dealing with silent states?
I am not sure where your problem is. I suspect it has to do with your code. Referring to the dynamic programming table, in the forward algorithm, you start from the top left set to 1 (begin state at position 0) and the rest of the first column (corresponding to position 0) is set to 0 (since we must be in the begin state) then the rest of the first row (corresponding to the begin state) is set to 0 because you never go back to the begin state. Then you move to the second column (position 1) and compute the probabilities for each cell then move to column 3 (position 2) and so on, adding the probabilities from the previous column. Maybe this tutorial can help explain better than I can. It has a worked example for the Viterbi algorithm but the idea is the same for the forward algorithm.
Ok, working with an example output sequence of:
Which can also be considered
mismatch match match (m M M)
.I set up my table:
For position 1, the Begin state only transitions to RX1 and S1, so:
For RX1, the probability of emitting a mismatch is 0, so:
For S1, I don't refer to the previous position because it is a silent state. So I sum the transition probabilities times the probabilities in the current position for the states that transition into S1, i.e. B and RX1. But both of those numbers are 0:
Now that my second column is all zeros, the rest of the table can't possibly be anything but zeros. What am I doing wrong?
I think the error you're making is that you're thinking in terms of emission probabilities. That's not what the cells represent. A cell (s,p) contains the probability of being in the state s at position p given the sequence at the previous positions. A silent state still has transition probability so you can get out of it.
No I understand that. When you calculate the cell, it's
Where s=state, p=position, e_s=emission probability of state s, S_p=symbol at position p, a_k=transition probability of previous state k that transitions into state s