how to find the total number of recognition sequence of a restriction enzyme in total human genome ?
how to find the total number of recognition sequence of a restriction enzyme in total human genome ?
You could accomplish this in R using the DECIPHER package. An example of using EcoRI to digest chromosome 1:
library(DECIPHER)
data(RESTRICTION_ENZYMES) # all restriction enzymes sold by NEB
dna <- readDNAStringSet("ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/CHR_01/hs_ref_GRCh38.p2_chr1.fa.gz")
cuts <- DigestDNA(RESTRICTION_ENZYMES["EcoRI"], dna, type="positions", strand="top")
length(unlist(cuts)) # 68543
Simply loop through the other chromosomes and sum the counts. Hope that helps!
Hi Erik,
Thanks a lot for this nice tool! I had a question related to this. I want to find cut sites in a human genome for 2 restriction enzymes whose names I do not know (they are not disclosed by the company), but I do know their cut sites, which are GATC^ and GANT^C (here N stands for any nucleotide and ^ is the cut site of the enzyme). Do you think your tool can accomplish this? Thanks again!
A little late to the party, however yes, you can accomplish that with a simple addition to the above lines of code. So, after you load the RESTRICTION_ENZYMES with data(RESTRICTION_ENZYMES) add execute the following line:
RESTRICTION_ENZYMES[["Give_a_code_name_to_your_unknown_enzyme"]]<-"GANT/C"
Then you will also have modify the DigestDNA() command:
cuts <- DigestDNA(RESTRICTION_ENZYMES["Give_a_code_name_to_your_uknown_enzyme"], dna, type="positions", strand="top")
And you are good to go. Replace the "Give_a_code_name_to_your_unknown_enzyme" with any string you want.
So, to summarize, you will have to do:
library(DECIPHER)
data(RESTRICTION_ENZYMES)
RESTRICTION_ENZYMES[["Give_a_code_name_to_your_unknown_enzyme"]]<-"GANT/C"
# The link to get the chromosomal sequences is outdated, so I'll use one that is as of Sep 7th 2020 valid:
dna <- readDNAStringSet("https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr1.fa.gz")
cuts <- DigestDNA(RESTRICTION_ENZYMES["Give_a_code_name_to_your_uknown_enzyme"], dna, type="positions", strand="top")
length(unlist(cuts))
Now, if you are not familiar with loops or the R syntax, to get the total number of fragments expected to be generated with digestion with the specific restriction enzyme, you can modify the above block of code as:
library(DECIPHER)
data(RESTRICTION_ENZYMES)
RESTRICTION_ENZYMES[["Give_a_code_name_to_your_unknown_enzyme"]]<-"GANT/C"
total_fragments<-0 # a variable in which we will store the fragments count
chr_list<-c(as.character(seq(1:21)),'X','Y','M') # generating a vector with the human chromosomes
for (i in chr_list){ # iterating through the vector with the chromosomes
dna <- readDNAStringSet(paste0("https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr",i,".fa.gz"))
cuts <- DigestDNA(RESTRICTION_ENZYMES["Give_a_code_name_to_your_uknown_enzyme"], dna, type="positions", strand="top")
total_fragments<-total_fragments+length(unlist(cuts))
}
total_fragments
And unless there's a typo, this should give you the number of fragments using a given enzyme. Note that with i iterating through the chr_list above, we are not fetching the unlocalized (chrUn_N) and unplaced sequences (chrN_random).
Last, if you want to digest with multiple restriction enzymes:
library(DECIPHER)
data(RESTRICTION_ENZYMES)
RESTRICTION_ENZYMES[["Give_a_code_name_to_your_unknown_enzyme1"]]<-"GANT/C"
RESTRICTION_ENZYMES[["Give_a_code_name_to_your_unknown_enzyme2"]]<-"GATC/"
multiple_enzymes <- RESTRICTION_ENZYMES[c("Give_a_code_name_to_your_unknown_enzyme1","Give_a_code_name_to_your_unknown_enzyme2")]
total_fragments<-0 # a variable in which we will store the fragments count
chr_list<-c(as.character(seq(1:21)),'X','Y','M') # generating a vector with the human chromosomes
for (i in chr_list){ # iterating through the vector with the chromosomes
dna <- readDNAStringSet(paste0("https://hgdownload.soe.ucsc.edu/goldenPath/hg38/chromosomes/chr",i,".fa.gz"))
cuts <- DigestDNA(multiple_enzymes, dna, type="positions", strand="top")
total_fragments<-total_fragments+length(unlist(cuts))
}
total_fragments
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
duplicate of Genomic Restriction Finder