Entering edit mode
9.9 years ago
Affan
▴
310
I have a position frequency matrix which I want to convert to a frequency weight matrix using the PWM() function of BioConductor in R.
The PFM is
A 271 342 445 1017 547 648 673 722 660 935 793 531
C 262 673 316 155 80 54 70 90 67 55 88 100
G 98 83 75 58 43 41 56 126 254 539 443 220
T 532 468 872 597 1046 891 991 855 879 267 412 153
M 0 0 0 0 0 0 0 0 0 0 0 0
R 0 0 0 0 0 0 0 0 0 0 0 0
W 0 0 0 0 0 0 0 0 0 0 0 0
S 0 0 0 0 0 0 0 0 0 0 0 0
Y 0 0 0 0 0 0 0 0 0 0 0 0
K 0 0 0 0 0 0 0 0 0 0 0 0
V 0 0 0 0 0 0 0 0 0 0 0 0
H 0 0 0 0 0 0 0 0 0 0 0 0
D 0 0 0 0 0 0 0 0 0 0 0 0
B 0 0 0 0 0 0 0 0 0 0 0 0
N 0 0 0 0 0 0 0 0 0 0 0 0
- 712 309 167 48 159 241 85 82 15 79 139 871
+ 0 0 0 0 0 0 0 0 0 0 0 0
. 0 0 0 0 0 0 0 0 0 0 0 0
So now, the problem is that I have counts of "-" which came from the gapped alignment (done using ClustalW).
Now my main question is that would it be okay for me to redistribute the counts of "-" equally to the rest of the bases? I've also heard the suggestion of ignoring the "-" row and just using the PFM as the PWM. What would be a better solution for research? I believe I can redistribute the counts and be okay.