I'm going to be doing some non-linear regression (with a huge and messy residual function), and I am thinking of using PDL::Fit::LM (I had some trouble getting Levmar to install).
The explanatory variables for my fit are DNA sequence (which I'm feeding into a position-specific weight-matrix). What's the easiest way to put a DNA sequence into a piddle? Given that the function i'm working with is a big mess, performance is a consideration.
Since my weight-matrix is constrained so that the sum of weights at a given position comes to zero, my plan is currently to represent each nucleotide as a vector of three elements A -> [1,0,0]
, C -> [0,1,0]
, G -> [0,0,1]
, T -> [-1,-1,-1]
. This way I can take a subsequence of my total sequence and just multiply it with my weight-matrix and get the score.
+1 for the most amusing BioStar title to date.
What's your question? Seems like you've answered it yourself.
Others have successfully used PDL for encoding alignments and other DNA related stuff. Too bad the PDL documentation terrible
@qdjm I'm just guessing that I'm not the first person to do this, and I'm wondering what solutions people have come up with. I mention my current idea as a point of reference.