How to use fastools digest function to cut the positive strand of DNA exclusively?

0

Entering edit mode

9.1 years ago

bright602 ▴ 50

I am currently dealing with some fasta files using fastools to digest the sequence by MboI restriction enzyme. The command I used is as follow fastools -digest -rest MboI -in A.fasta -out REA.fasta This enzyme cuts GATC sites, but the output also contain the complementary strand information which starts with CTAG. Therefore, I wanna ask whether it is possible to let the program only cut GATC instead of cutting CTAG? Or is there any other program to cut the DNA sequence in fasta file based on restriction site? Thanks!

gene sequence • 3.2k views

ADD COMMENT • link updated 2.0 years ago by Ram 45k • written 9.1 years ago by bright602 ▴ 50

1

Entering edit mode

9.1 years ago

GouthamAtla 12k

import re,sys
from Bio import SeqIO

def print_ReSites(id,seqence):
    pattern=r"GATC"
    seq_len=len(seqence)
    sites = [str(m.start()) for m in re.finditer(pattern,seqence)]
    sites.append(str(seq_len))
    for start,end in zip(sites,sites[1:]):
        print id+"\t"+start+"\t"+end

for seq in SeqIO.parse(sys.argv[1],"fasta"):
    print_ReSites( str( seq.id),str(seq.seq))

Usage:

script.py in.fasta | bedtools getfasta -fi in.fasta -bed - -fo re_out.fasta

Example:

$ cat in.fasta

>1 
AGAGGAGGATCGAGGAGGTGATCGAGGATTTTGAGAGGAGGATCGAGGAGGTGATCGAGGATTTTG
>2 
GAGGGGGCTGGCGGCGGGATCGGAGGGGatttaggaGATCgaggattg

$ cat re_out.fasta

>1:7-19
GATCGAGGAGGT
>1:19-40
GATCGAGGATTTTGAGAGGAG
>1:40-52
GATCGAGGAGGT
>1:52-66
GATCGAGGATTTTG
>2:17-36
GATCGGAGGGGatttagga
>2:36-48
GATCgaggattg

ADD COMMENT • link updated 2.0 years ago by Ram 45k • written 9.1 years ago by GouthamAtla 12k

0

Entering edit mode

Hi Goutham,

Thanks for your script. I am trying the script while have the syntax error as follow, could you help me how to fix it?

python script.py in.fasta | bedtools getfasta -fi in.fasta -bed - -fo re_out.fasta index file in.fasta.fai not found, generating... File "script.py", line 13 print_ReSites( strseq.id),str(seq.seq)) ^ SyntaxError: invalid syntax

ADD REPLY • link 9.1 years ago by bright602 ▴ 50

0

Entering edit mode

Thank you so much!! It works!

ADD REPLY • link 9.1 years ago by bright602 ▴ 50

0

Entering edit mode

Hi Goutham,

I am also using your script with my own sequences but I have the same error. It seems is because of the extra " ) " at the end of the code. Then I deleted it, and know I have another problem. Could you help me to fix it please? The error:

print_ReSites( strseq.id),str(seq.seq)
NameError: name 'strseq' is not defined

ADD REPLY • link 9.0 years ago by Nicolas Lichilin ▴ 20

1

Entering edit mode

There is a problem with biostars formatting. This should help.

	import re,sys
	from Bio import SeqIO

	def print_ReSites(id,seqence):
	pattern=r"GATC"
	seq_len=len(seqence)
	sites = [str(m.start()) for m in re.finditer(pattern,seqence)]
	sites.append(str(seq_len))
	for start,end in zip(sites,sites[1:]):
	print id+"\t"+start+"\t"+end

	for seq in SeqIO.parse(sys.argv[1],"fasta"):
	print_ReSites(str(seq.id),str(seq.seq))

view raw re_digest.py hosted with ❤ by GitHub

ADD REPLY • link 9.0 years ago by GouthamAtla 12k

1

Entering edit mode

I've added spaces to circumvent that. It is a weird bug. Here's what I tried to pin down the root cause:

Removed the s before the ( (from print_Resites) to check if s( is the problem
Moved the 4-spaces formatted code to back-tick formatted code
Substituted the _ in the method name to __
Used backslash to escape the parentheses (in which case it prints with the backslashes in)
Tried the backticks-surrounding-the-problem-element method that seems to solve similar problems encountered in URLs with parentheses inside markdown.

None of those worked. Maybe Istvan Albert can shed some light - it seems to be a problem with the wmd markdown library.

ADD REPLY • link 9.0 years ago by Ram 45k

0

Entering edit mode

It works!!! Thanks!!

ADD REPLY • link 9.0 years ago by Nicolas Lichilin ▴ 20

0

Entering edit mode

9.1 years ago

piet ★ 1.9k

MboI is a bacterial restriction endonuclease which cleaves double stranded DNA at the palindromic sequence site GATC.

fastools is a computer programm which may be used to identify restriction enzyme cleavage sites in double standed DNA. For your convenience, you feed only the sequence of one strand into this program, and fastools will recognize all cleavage sites on that strand. CTAG is a valid cleavage site for MboI.

ADD COMMENT • link 9.1 years ago by piet ★ 1.9k

Login before adding your answer.