I'm trying to implement a javascript-based search of PWM in a DNA sequence. I've been inspired by the following post: http://davetang.org/muse/2013/10/01/position-weight-matrix/ and by the R/Biostrings-code https://github.com/genome-vendor/r-bioc-biostrings/blob/master/src/match_PWM.c. See my code below.
I'm starting from a PFM, and I convert it to PWM using the algorithm described in http://davetang.org/muse/2013/10/01/position-weight-matrix/. I get the same results:
PWM input
A 0 4 4 0 3 7 4 3 5 4 2 0 0 4
C 3 0 4 8 0 0 0 3 0 0 0 0 2 4
G 2 3 0 0 0 0 0 0 1 0 6 8 5 0
T 3 1 0 0 5 1 4 2 2 4 0 0 1 0
PWM output
A -1.936751795439824 0.7980887856393778 0.7980887856393778 -1.936751795439824 0.4535418763586175 1.5094375840527072 0.7980887856393778 0.4535418763586175 1.0760077609593484 0.7980887856393778 0 -1.936751795439824 -1.936751795439824 0.7980887856393778
C 0.4535418763586175 -1.936751795439824 0.7980887856393778 1.6854416207624505 -1.936751795439824 -1.936751795439824 -1.936751795439824 0.4535418763586175 -1.936751795439824 -1.936751795439824 -1.936751795439824 -1.936751795439824 0 0.7980887856393778
G 0 0.4535418763586175 -1.936751795439824 -1.936751795439824 -1.936751795439824 -1.936751795439824 -1.936751795439824 -1.936751795439824 -0.665198492276212 -1.936751795439824 1.308938775371164 1.6854416207624505 1.0760077609593484 -1.936751795439824
T 0.4535418763586175 -0.665198492276212 -1.936751795439824 -1.936751795439824 1.0760077609593484 -0.665198492276212 0.7980887856393778 0 0 0.7980887856393778 -1.936751795439824 -1.936751795439824 -0.665198492276212 -1.936751795439824
Searching the motif in the DNA sequence : I'm not confident about my code.
Score("GAAATTACAAGGG"): null
Score("ttacataagtagtc"): null
Score("ttacataACCagtc"): null
Score("CAACTAAAAAGGGC"): 2.692960768092946
My main problem is about the min_score
https://github.com/genome-vendor/r-bioc-biostrings/blob/master/src/match_PWM.c. As far as I understand this score should is a fraction 0<=min_score<1 . But (am I wrong ?), there is no normalization here on the X-axis. So is my code correct and if no, how should I fix it?
Thank you,
Pierre
The script that is mentioned appears to be available here