how to find the total number of recognition sequence of a restriction enzyme in total human genome ?
1
0
Entering edit mode
8.9 years ago
reihaneh • 0

how to find the total number of recognition sequence of a restriction enzyme in total human genome ?

genome gene • 4.9k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
3
Entering edit mode
8.9 years ago
Erik Wright ▴ 420

You could accomplish this in R using the DECIPHER package. An example of using EcoRI to digest chromosome 1:

library(DECIPHER)
data(RESTRICTION_ENZYMES) # all restriction enzymes sold by NEB
dna <- readDNAStringSet("ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/CHR_01/hs_ref_GRCh38.p2_chr1.fa.gz")
cuts <- DigestDNA(RESTRICTION_ENZYMES["EcoRI"], dna, type="positions", strand="top")
length(unlist(cuts)) # 68543

Simply loop through the other chromosomes and sum the counts. Hope that helps!

ADD COMMENT
0
Entering edit mode

Hi Erik,

Thanks a lot for this nice tool! I had a question related to this. I want to find cut sites in a human genome for 2 restriction enzymes whose names I do not know (they are not disclosed by the company), but I do know their cut sites, which are GATC^ and GANT^C (here N stands for any nucleotide and ^ is the cut site of the enzyme). Do you think your tool can accomplish this? Thanks again!

ADD REPLY
0
Entering edit mode

A little late to the party, however yes, you can accomplish that with a simple addition to the above lines of code. So, after you load the RESTRICTION_ENZYMES with data(RESTRICTION_ENZYMES) add execute the following line:

RESTRICTION_ENZYMES[["Give_a_code_name_to_your_unknown_enzyme"]]<-"GANT/C"

Then you will also have modify the DigestDNA() command:

cuts <- DigestDNA(RESTRICTION_ENZYMES["Give_a_code_name_to_your_uknown_enzyme"], dna, type="positions", strand="top")

And you are good to go. Replace the "Give_a_code_name_to_your_unknown_enzyme" with any string you want.

So, to summarize, you will have to do:

library(DECIPHER)
data(RESTRICTION_ENZYMES)
RESTRICTION_ENZYMES[["Give_a_code_name_to_your_unknown_enzyme"]]<-"GANT/C"
# The link to get the chromosomal sequences is outdated, so I'll use one that is as of Sep 7th 2020 valid:
dna <- readDNAStringSet("https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr1.fa.gz")
cuts <- DigestDNA(RESTRICTION_ENZYMES["Give_a_code_name_to_your_uknown_enzyme"], dna, type="positions", strand="top")
length(unlist(cuts))

Now, if you are not familiar with loops or the R syntax, to get the total number of fragments expected to be generated with digestion with the specific restriction enzyme, you can modify the above block of code as:

library(DECIPHER)
data(RESTRICTION_ENZYMES)
RESTRICTION_ENZYMES[["Give_a_code_name_to_your_unknown_enzyme"]]<-"GANT/C"
total_fragments<-0 # a variable in which we will store the fragments count
chr_list<-c(as.character(seq(1:21)),'X','Y','M') # generating a vector with the human chromosomes
for (i in chr_list){ # iterating through the vector with the chromosomes
dna <- readDNAStringSet(paste0("https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr",i,".fa.gz"))
cuts <- DigestDNA(RESTRICTION_ENZYMES["Give_a_code_name_to_your_uknown_enzyme"], dna, type="positions", strand="top")
total_fragments<-total_fragments+length(unlist(cuts))
}
total_fragments

And unless there's a typo, this should give you the number of fragments using a given enzyme. Note that with i iterating through the chr_list above, we are not fetching the unlocalized (chrUn_N) and unplaced sequences (chrN_random).

Last, if you want to digest with multiple restriction enzymes:

library(DECIPHER)
data(RESTRICTION_ENZYMES)
RESTRICTION_ENZYMES[["Give_a_code_name_to_your_unknown_enzyme1"]]<-"GANT/C"
RESTRICTION_ENZYMES[["Give_a_code_name_to_your_unknown_enzyme2"]]<-"GATC/"
multiple_enzymes <- RESTRICTION_ENZYMES[c("Give_a_code_name_to_your_unknown_enzyme1","Give_a_code_name_to_your_unknown_enzyme2")]
total_fragments<-0 # a variable in which we will store the fragments count
chr_list<-c(as.character(seq(1:21)),'X','Y','M') # generating a vector with the human chromosomes
for (i in chr_list){ # iterating through the vector with the chromosomes
dna <- readDNAStringSet(paste0("https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr",i,".fa.gz"))
cuts <- DigestDNA(multiple_enzymes, dna, type="positions", strand="top")
total_fragments<-total_fragments+length(unlist(cuts))
}
total_fragments
ADD REPLY

Login before adding your answer.

Traffic: 2507 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6