I'm looking for a simple method for calculating position specific weight matrices (PWM) from a position occurrence matrix ... like what's found in the jaspar database:
>MA0004.1 Arnt
A [ 4 19 0 0 0 0 ]
C [16 0 20 0 0 0 ]
G [ 0 1 0 20 0 20 ]
T [ 0 0 0 0 20 0 ]
I need to scan a large collection of sequences and submitting them to online services would be a complete hassle.
Once I have a PWM I know how to scan sequences, but I'm just having trouble creating them.
Thanks,
Will
PS. I'm looking for an equation (or psuedocode) for fining the PWM, not a library. I plan to implement it in python and matlab. I'd prefer not to make a wrapper for system-calls but to actually implement it in my code. Thanks
Have you checked the wikipedia page for PSSM?. It covers the math behind PWMs / PSSM in a comprehensive way. It is not clear to me what you mean by position occurrence matrix.
I presumed that by 'position occurrence matrix' they meant a matrix of counts, rather than frequencies or weights, such as those in Jaspar, based on experimental data. The wikipedia page is fine, but I think the primary reference by Hertz and Stormo is preferable as it can be referenced in any future publications by Will.