Entering edit mode
7.4 years ago
Joe
▴
40
Hi there,
I want to predict the CpG sites' exact position (not for CpG islands) in non-human genome. Anyone can help?
Thanks! Joe
Hi there,
I want to predict the CpG sites' exact position (not for CpG islands) in non-human genome. Anyone can help?
Thanks! Joe
This program I wrote fastaRegexFinder could help you. You could get the positions of CpGs in bed format with something like:
fastaRegexFinder.py -f genome.fa -r CG --noreverse > CpG.bed
But yes, finding CpG is quite easy in case you want to give it go writing your own script.
A solution in R that can get the CpG coordinates from any given BSgenome.
library(BSgenome.Hsapiens.UCSC.hg38)
require(Biostrings)
require(parallel)
Find_CpG <- function(Genome, Cores){
if (class(Genome) != "BSgenome") stop("Genome must be a BSgenome!")
CpG <- mclapply(seqlevels(Genome), function(x) start(matchPattern("CG", Genome[[x]])), mc.cores = Cores)
return(
suppressWarnings(
do.call(c, mclapply(1:length(seqlevels(Genome)), function(x) GRanges(names(Genome)[x],
IRanges(CpG[[x]], width = 2)
), mc.cores=Cores))
)
)
}
## Example:
hg38.CpG <- Find_CpG(Genome = BSgenome.Hsapiens.UCSC.hg38, Cores = 8)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
What format is your data in? FASTA, I presume?
If so, this is something that can be handled with a short homemade Python script, iterating through the sequence and writing instances of CG dinucleotide pairs into an output file. I'd be happy to throw something along those lines together if that's the case.
Hi aays,
Thanks for your comments. Can you please share your Python script?