Question

Make My Own Restriction Enzyme In Biopython

1

Entering edit mode

11.0 years ago

ppagemccaw ▴ 10

I'd like to write a script to help me sort through possible CRISPR sites in various exons. Seems like I should be able to do this by creating a new instance of AbstractCut in BioPython. But what I can find in the BioPython source seems to strongly discourage this. I think the assumption is that all the restriction enzymes will be created within Biopython and not by the user. (As if the world of discovery starts and stops with rebase. Not a bad assumption 11 months ago.)

I'd like to get back to the bench and really have limited skills with the informatics. Is there a tutorial or description somewhere of how to create new instances of AbstractCut (or whatever the right class is) with my own randomly chosen restriction sequence? When I try to create

from Bio.Restriction import * 
class myCrisprCutter(AbstractCut):
  def __init__(self):
    #AbstractCut.__init__(self)
    print "do something'
    #self.site = "something"
    #self.compsite = re.compile("write a regex that works here")

I error with AbstractCut has not defined. That should give a sense of how much I have to learn!

Thanks

biopython • 4.3k views

ADD COMMENT • link updated 11.0 years ago by pstew ▴ 50 • written 11.0 years ago by ppagemccaw ▴ 10

1

Entering edit mode

Personally, if you're confident with Python and regex, I'd say skip the BioPython complexity and just use the python re package. This is a simple string search problem, I'm not sure that the added complexity of BioPython really adds anything.

ADD REPLY • link 11.0 years ago by pld 5.1k

score 0 · Answer 1 · 2013-12-05

I thought about just using re and avoiding BioPython, and looking at the internals Restriction.search is just a RegEx call.

But the more that I thought about it BioPython, or some equivalent module, has a lot of advantages. First, it knows that DNA is not a subset of the english alphabet, it knows about strands and that N is in a different category than G. Using a tool that understands these differences would make anything I write today much more valuable tomorrow and more valuable to the people around me. Second, BioPython has tools to connect to the big databases which would make many more things possible. Like having a student input a gene name and then output 'best' crispr targets. Third, BioPython has a command line interface for things like primer3. This would allow the same script tools to identify genotyping primers. Fourth, crispr targets are so common that we can pick and choose among them based on other criteria. For instance whether they generate RFLPs.While not essential, generating a RFLP is nice. Since the easy way to find restriction sites is the Restriction module of BioPython, I thought this would be a good way to proceed.

I guess I got ambitious. Using re to do what I can do by eyeball isn't all that attractive. What is attractive is building a larger tool set.

score 0 · Answer 2 · 2013-12-05

It looks like you want to make your own class modeled after the AbstractCut class that can cut a given CRISPR. I think the best way to approach this would be to model your class after one of the palindromic subclasses using your sequence(s). Here's a link to the class in the BioPython documentation: http://biopython.org/DIST/docs/api/Bio.Restriction.Restriction.Palindromic-class.html . You can navigate around the different classes/subclasses and also click to view the source code to see a specific example.

As someone above mentioned, it may be faster/easier to write some quick RE that can pick out your sub-sequence of choice from a larger sequence. If you have reservations about not using BioPython, then you can first use BioPython to download or otherwise process the gene/sequences you desire, and then call your RE to pick out the CRISPR site.