Question

Insillico dual restriction enzyme reference genome digestion.

1

Entering edit mode

8.8 years ago

William ★ 5.3k

I am looking for a tool that can do a insillico digestion of a reference genome given 2 restriction enzymes (or the pattern at which both cut).

The output that I would like to have is a BED file with the start and end position on the reference genome of all the produced RE DNA fragments. (or only those that are larger than a specific lenght).

The tool should take into account that both the forward and reverse DNA strand could be cut.

Is there any such tool that I can download?

ddradseq insillico • 3.4k views

ADD COMMENT • link updated 18 months ago by tshtatland ▴ 190 • written 8.8 years ago by William ★ 5.3k

0

Entering edit mode

You can try restrict from EMBOSS. You would need to make the BED format file from the output of restrict.

ADD REPLY • link 8.8 years ago by GenoMax 150k

0

Entering edit mode

duplicate of Genomic Restriction Finder

ADD REPLY • link 8.8 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

8.8 years ago

dariober 15k

This is a hack, but it might work. I wrote a script to find regular expressions in fasta files here fastaRegexFinder.

To find all the restriction fragments from two enzymes, run fastaRegexFinder.py for each combination of enzyme 1 and enzyme 2 in forward and reverse (= 16 times). E.g. say restriction sites are TTCC and ACTG, do:

res="TTCC GGAA ACTG CAGT"
for r1 in $res
do
    for r2 in $res
    do
    echo "$r1 and $r2"
    fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r $r1.*?(?=$r2)
    done
done

This compiles and runs the following commands:

fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r TTCC.*?(?=TTCC)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r TTCC.*?(?=GGAA)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r TTCC.*?(?=ACTG)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r TTCC.*?(?=CAGT)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r GGAA.*?(?=TTCC)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r GGAA.*?(?=GGAA)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r GGAA.*?(?=ACTG)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r GGAA.*?(?=CAGT)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r ACTG.*?(?=TTCC)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r ACTG.*?(?=GGAA)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r ACTG.*?(?=ACTG)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r ACTG.*?(?=CAGT)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r CAGT.*?(?=TTCC)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r CAGT.*?(?=GGAA)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r CAGT.*?(?=ACTG)
fastaRegexFinder.py -q --noreverse -f test_data/synth.fa -r CAGT.*?(?=CAGT)

The output (to stdout by default) will be in bed format.

ADD COMMENT • link 8.8 years ago by dariober 15k

0

Entering edit mode

6.9 years ago

c_u ▴ 530

Another good option is a utility called digest_genome(.py) that comes with HicPro. It can digest the reference genome by the provided restriction enzymes(s) and generate a BED file with the list of restriction fragments after digestion. You can specify multiple restriction enzymes too. here is the link

ADD COMMENT • link 6.9 years ago by c_u ▴ 530

0

Entering edit mode

18 months ago

tshtatland ▴ 190

Use EMBOSS: restrict. From the docs:

Report restriction enzyme cleavage sites in a nucleotide sequence

You have to write a trivial parser to convert the output to bed format.

ADD COMMENT • link 18 months ago by tshtatland ▴ 190

score 1 · Accepted Answer · 2016-07-11

I found the R package SimRad which

provides a number functions to simulate restriction enzyme digestion, library construction and fragments size selection.

https://cran.r-project.org/web/packages/SimRAD/SimRAD.pdf

A nice feature is that a filter step can also be done on restriction site combination to only select fragments that start with enzyme A restriction site and end with enzyme B restriction site.

adapt.select(sequences, type = "AB+BA", cut_site_5prime1, cut_site_3prime1,
cut_site_5prime2, cut_site_3prime2)

The output is the DNA fragments produced by the digestion and size selection.

The output does not contain were the fragments are located on the genome.

To create the BED file I need to BLAST the fragments versus the reference genome I guess.