Make My Own Restriction Enzyme In Biopython
2
1
Entering edit mode
11.0 years ago
ppagemccaw ▴ 10

I'd like to write a script to help me sort through possible CRISPR sites in various exons. Seems like I should be able to do this by creating a new instance of AbstractCut in BioPython. But what I can find in the BioPython source seems to strongly discourage this. I think the assumption is that all the restriction enzymes will be created within Biopython and not by the user. (As if the world of discovery starts and stops with rebase. Not a bad assumption 11 months ago.)

I'd like to get back to the bench and really have limited skills with the informatics. Is there a tutorial or description somewhere of how to create new instances of AbstractCut (or whatever the right class is) with my own randomly chosen restriction sequence? When I try to create

from Bio.Restriction import * 
class myCrisprCutter(AbstractCut):
  def __init__(self):
    #AbstractCut.__init__(self)
    print "do something'
    #self.site = "something"
    #self.compsite = re.compile("write a regex that works here")

I error with AbstractCut has not defined. That should give a sense of how much I have to learn!

Thanks

biopython • 4.3k views
ADD COMMENT
1
Entering edit mode

Personally, if you're confident with Python and regex, I'd say skip the BioPython complexity and just use the python re package. This is a simple string search problem, I'm not sure that the added complexity of BioPython really adds anything.

ADD REPLY
0
Entering edit mode
11.0 years ago
ppagemccaw ▴ 10

I thought about just using re and avoiding BioPython, and looking at the internals Restriction.search is just a RegEx call.

But the more that I thought about it BioPython, or some equivalent module, has a lot of advantages. First, it knows that DNA is not a subset of the english alphabet, it knows about strands and that N is in a different category than G. Using a tool that understands these differences would make anything I write today much more valuable tomorrow and more valuable to the people around me. Second, BioPython has tools to connect to the big databases which would make many more things possible. Like having a student input a gene name and then output 'best' crispr targets. Third, BioPython has a command line interface for things like primer3. This would allow the same script tools to identify genotyping primers. Fourth, crispr targets are so common that we can pick and choose among them based on other criteria. For instance whether they generate RFLPs.While not essential, generating a RFLP is nice. Since the easy way to find restriction sites is the Restriction module of BioPython, I thought this would be a good way to proceed.

I guess I got ambitious. Using re to do what I can do by eyeball isn't all that attractive. What is attractive is building a larger tool set.

ADD COMMENT
0
Entering edit mode
11.0 years ago
pstew ▴ 50

It looks like you want to make your own class modeled after the AbstractCut class that can cut a given CRISPR. I think the best way to approach this would be to model your class after one of the palindromic subclasses using your sequence(s). Here's a link to the class in the BioPython documentation: http://biopython.org/DIST/docs/api/Bio.Restriction.Restriction.Palindromic-class.html . You can navigate around the different classes/subclasses and also click to view the source code to see a specific example.

As someone above mentioned, it may be faster/easier to write some quick RE that can pick out your sub-sequence of choice from a larger sequence. If you have reservations about not using BioPython, then you can first use BioPython to download or otherwise process the gene/sequences you desire, and then call your RE to pick out the CRISPR site.

ADD COMMENT
1
Entering edit mode

This is the point I was getting at, BioPython really has limited value for doing sequence based operations. What is worse is that while it appears to support ambiguous IUPAC Nucleotide characers, it really doesnt:

>>>Seq('AAN', IUPAC.IUPACAmbiguousDNA()) == Seq('AAA', IUPAC.IUPACAmbiguousDNA())
>>>False

Really all that happens is that under the hood BioPython treats sequences like a string, so you gain little advantage in using BioPython to mine for CRISPR sites. As for obtaining sequences, you really want to avoid using BioPython for downloading bulk data. You'd be better off periodical updates from IMG/NCBI for bacteria or Ensembl/UCSD for Eukaryotes.

ADD REPLY

Login before adding your answer.

Traffic: 1988 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6