How to use fastools digest function to cut the positive strand of DNA exclusively?
2
0
Entering edit mode
8.7 years ago
bright602 ▴ 50

I am currently dealing with some fasta files using fastools to digest the sequence by MboI restriction enzyme. The command I used is as follow fastools -digest -rest MboI -in A.fasta -out REA.fasta This enzyme cuts GATC sites, but the output also contain the complementary strand information which starts with CTAG. Therefore, I wanna ask whether it is possible to let the program only cut GATC instead of cutting CTAG? Or is there any other program to cut the DNA sequence in fasta file based on restriction site? Thanks!

gene sequence • 2.9k views
ADD COMMENT
1
Entering edit mode
8.7 years ago
import re,sys
from Bio import SeqIO

def print_ReSites(id,seqence):
    pattern=r"GATC"
    seq_len=len(seqence)
    sites = [str(m.start()) for m in re.finditer(pattern,seqence)]
    sites.append(str(seq_len))
    for start,end in zip(sites,sites[1:]):
        print id+"\t"+start+"\t"+end

for seq in SeqIO.parse(sys.argv[1],"fasta"):
    print_ReSites( str( seq.id),str(seq.seq))

Usage:

script.py in.fasta | bedtools getfasta -fi in.fasta -bed - -fo re_out.fasta

Example:

$ cat in.fasta

>1 
AGAGGAGGATCGAGGAGGTGATCGAGGATTTTGAGAGGAGGATCGAGGAGGTGATCGAGGATTTTG
>2 
GAGGGGGCTGGCGGCGGGATCGGAGGGGatttaggaGATCgaggattg

$ cat re_out.fasta

>1:7-19
GATCGAGGAGGT
>1:19-40
GATCGAGGATTTTGAGAGGAG
>1:40-52
GATCGAGGAGGT
>1:52-66
GATCGAGGATTTTG
>2:17-36
GATCGGAGGGGatttagga
>2:36-48
GATCgaggattg
ADD COMMENT
0
Entering edit mode

Hi Goutham,

Thanks for your script. I am trying the script while have the syntax error as follow, could you help me how to fix it?

python script.py in.fasta | bedtools getfasta -fi in.fasta -bed - -fo re_out.fasta index file in.fasta.fai not found, generating... File "script.py", line 13 print_ReSites( strseq.id),str(seq.seq)) ^ SyntaxError: invalid syntax

ADD REPLY
0
Entering edit mode

Thank you so much!! It works!

ADD REPLY
0
Entering edit mode

Hi Goutham,

I am also using your script with my own sequences but I have the same error. It seems is because of the extra " ) " at the end of the code. Then I deleted it, and know I have another problem. Could you help me to fix it please? The error:

print_ReSites( strseq.id),str(seq.seq)
NameError: name 'strseq' is not defined
ADD REPLY
1
Entering edit mode

There is a problem with biostars formatting. This should help.

ADD REPLY
1
Entering edit mode

I've added spaces to circumvent that. It is a weird bug. Here's what I tried to pin down the root cause:

  1. Removed the s before the ( (from print_Resites) to check if s( is the problem
  2. Moved the 4-spaces formatted code to back-tick formatted code
  3. Substituted the _ in the method name to __
  4. Used backslash to escape the parentheses (in which case it prints with the backslashes in)
  5. Tried the backticks-surrounding-the-problem-element method that seems to solve similar problems encountered in URLs with parentheses inside markdown.

None of those worked. Maybe Istvan Albert can shed some light - it seems to be a problem with the wmd markdown library.

ADD REPLY
0
Entering edit mode

It works!!! Thanks!!

ADD REPLY
0
Entering edit mode
8.7 years ago
piet ★ 1.9k

MboI is a bacterial restriction endonuclease which cleaves double stranded DNA at the palindromic sequence site GATC.

fastools is a computer programm which may be used to identify restriction enzyme cleavage sites in double standed DNA. For your convenience, you feed only the sequence of one strand into this program, and fastools will recognize all cleavage sites on that strand. CTAG is a valid cleavage site for MboI.

ADD COMMENT

Login before adding your answer.

Traffic: 1713 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6