I have a PWM (position weight matrix with the site specific frequencies) for a motif, how can I use it in Biopython? In module Motif from biopython, the Bio.motif does not seem to parse the PWM format right away.
Update: Here it is:
>consec1
A [ 0.0726 0.3307 0.9284 0.4731 -0.0761 1.9941 -0.8980 -0.8980 ]
C [ 0.7140 0.9354 -0.0167 1.0279 1.1967 -0.7772 -0.8743 -0.8743 ]
G [ 0.5377 0.3913 0.7350 -0.0072 0.3856 -0.8254 -0.8254 1.6783 ]
T [ 0.3675 -0.1780 -0.4293 0.1086 -0.3954 -0.9879 2.3190 -0.9879 ]
I tried different things and I received different errors. One of these was about the header, then when I removed the header I received: "UnboundLocalError: local variable 'inst' referenced before assignment"
The code I used:
>from Bio import Motif
>motif = Motif.read(open("consec1.pfm"),"jaspar-sites")
and I get the following error:
---------------------------------------------------------------------------
UnboundLocalError Traceback (most recent call last)
/mnt/XI/home/png/ant/<ipython-input-25-d60b25d64c48> in <module>()
----> 1 m = Motif.read(open("consec1.pfm"),"jaspar-sites")
/usr/prog/python/2.6.6_gnu/lib/python2.6/site-packages/Bio/Motif/__init__.pyc in read(handle, format)
121 iterator = parse(handle, format)
122 try:
--> 123 first = iterator.next()
124 except StopIteration:
125 first = None
/usr/prog/python/2.6.6_gnu/lib/python2.6/site-packages/Bio/Motif/__init__.pyc in parse(handle, format)
74 raise ValueError("Wrong parser format")
75 else: #we have a proper reader
---> 76 yield reader(handle)
77 else: # we have a proper reader
78 for m in parser(handle).motifs:
/usr/prog/python/2.6.6_gnu/lib/python2.6/site-packages/Bio/Motif/__init__.pyc in _from_sites(handle)
23
24 def _from_sites(handle):
---> 25 return Motif()._from_jaspar_sites(handle)
26
27 _readers={"jaspar-pfm": _from_pfm,
/usr/prog/python/2.6.6_gnu/lib/python2.6/site-packages/Bio/Motif/_Motif.pyc in _from_jaspar_sites(self, stream)
560 self.add_instance(inst)
561
--> 562 self.set_mask("*"*len(inst))
563 return self
564
UnboundLocalError: local variable 'inst' referenced before assignment
-----------------------------------------------------------------------------------------
One other time that python parsed it without any error messages, then the consensus sequence (motif.consensus()) was not correct according to the matrix values. At that point, I created the matrix as columns and rows with tab separated numbers, without any other letters/headers.
The matrix then was:
0.0726 0.3307 0.9284 0.4731 -0.0761 1.9941 -0.8980 -0.8980
0.7140 0.9354 -0.0167 1.0279 1.1967 -0.7772 -0.8743 -0.8743
0.5377 0.3913 0.7350 -0.0072 0.3856 -0.8254 -0.8254 1.6783
0.3675 -0.1780 -0.4293 0.1086 -0.3954 -0.9879 2.3190 -0.9879
The code I used the second time:
>from Bio import Motif
>motif = Motif.read(open("consec1.pfm"),"jaspar-sites")
>motif.consensus()
Output: Seq('CCACCATT', IUPACUnambiguousDNA())
While I am expecting ccaccatG.
Thanks for the help.
Could you clarify - do you get an error from Biopython? If so what is the error. Which version of Biopython do you have? Where did the PWM come from, and what format is it in? etc.
See also: How to ask Good Questions on Technical and Scientific Forums
I found the PWM in a book which is just in rows and columns. Then I try to create a jaspar-like format and use Motif.read() from Biopython to parse it. But it doesn't work.
Just saying "it doesn't work" isn't enough for anyone to help you. At a minimum, please add the error message (traceback) Python gives you.
Also, it sounds like the problem is your input data is not exactly in the expected format - can you share your PWM file?
OK, better - now about about the FULL error message, and the Python code used to try and load this example?
I edited your question to mark the sample file, code snippets, and error message with the "code" style (the icon with 0 and 1 in it) because otherwise it was impossible to read.
Regarding this part, "At that point, I created the matrix as columns and rows with tab separated numbers, without any other letters/headers." could you include that sample data file too please?
You can parse this file with the latest code in Bio.motifs:
You probably need Biopython 1.62b for this to work. Otherwise, the latest version of Biopython on github.
Michiel - can you post that below as a suggested answer, rather than here as a comment? Does it give the expected consensus sequence?