This is a reply, not an answer... I understand the use of j-i +1. Without this the initialisation doesn't make sense.
I appreciate your help, Thank you, although it has brought about a few more questions...
So I understand that i am comparing 1 RNA sequence with itself. If I was doing an alignment I would be inserting gaps and such, so I'm not doing that in this case.
The RNA sequence can also be written on the top and left hand side of the matrix, along with the axes numbers and the coordinates for each cell in the matrix. N[i1, j1] excetera.
Can I assume that a sequence of length 0 consists of no ribonucleotides and a sequence of length 1 consists of a single ribonucleotide? Then how do i know what a sequence of length 3 or 4 looks like? and how do I know if the sequence of length 3 or 4 is base paired or not and where the base pairs are?
After initialisation, is it okay to start with sequences of length 2, and see if they base pair or not? Is this the minimum length I can start with, or should i make sure that it is of length 3, so that there is a guaranted base pair somewhere?
Is the output of this recursion the structure of the base paired RNA in dot bracket form or is it the structure of the base paired RNA in ribonucleotide letters?
During recursion, why should I traverse through and fill up the matrix in rows of negative diagonals? Is there a special reason for this?
What is case one of the scoring function?
Also, what am I doing with the Nussinov Algorithm? Am I looking for the structure that has the most base pairs? Am I looking for the structure with the most-likely number of base pairs(not the most, not the least)?
If the scoring functions are:
score i, j-1
score( i+1, j-1) + different_score (i, j)
score (i + 1, j)
Is there a particular one that should be used first, second and/or third, or does it not matter?
Maybe these aren't scoring functions. If not, then the scoring function is put a 1 if there is a base pair that can occur and a 0 if a base pair cannot occur.
Yes.
In cell (i,j) you’re looking at the maximum number of basepairs that can occur in subsequence S[i..j]. At that point you’re determining if bases i and j forming a pair is better than any of the other possibilities (i unpaired, j unpaired, bifurcation). For a sequence of length N, you’re looking at a potential basepair between bases 1 and N, but all you know at that point (during filling) is how many other basepairs are in the structure, not where they are. Those details don’t matter to the algorithm at that point.
You’re doing sequences of length 2, 3, 4, …, N. That is why you fill the diagonals in that particular manner, i.e. each diagonal represents sequences of a certain length. (Remember, main diagonal: length 1). Concerning “guaranteed” basepairs: by default, the Nussinov algorithm will model basepairs even between adjacent bases, i.e. in sequences of length 2. A common homework assignment is to modify the algorithm so that it does obey a minimum hairpin loop size of 3 bases.
The output of the “fill”-recursion is the filled matrix containing the maximum numbers of base pairs. After traceback recursion, you will have a list of pairs (i, j), which you then can use to visualise the structure however you like. What is case one of the scoring function?
In my book that is the (i + 1, j), which would be: what score does an unpaired base i contribute to the structure.
You’re looking for the maximum number of basepairs that a certain sequence can -- theoretically -- contain according to a simple scoring model.
During fill, you’re taking the max anyway. During traceback, it’s your choice.
It is “kind” of a scoring function, just a very simple one.