How to get the coordinations of CpG sites in non-human genome
2
Hi there,
I want to predict the CpG sites' exact position (not for CpG islands) in non-human genome. Anyone can help?
Thanks!
Joe
genome
Methylation
non-human genome
• 3.6k views
•
link
updated 6.8 years ago by
ATpoint
85k
•
written 6.8 years ago by
Joe
▴
40
This program I wrote fastaRegexFinder could help you. You could get the positions of CpGs in bed format with something like:
fastaRegexFinder.py -f genome.fa -r CG --noreverse > CpG.bed
But yes, finding CpG is quite easy in case you want to give it go writing your own script.
A solution in R that can get the CpG coordinates from any given BSgenome.
library(BSgenome.Hsapiens.UCSC.hg38)
require(Biostrings)
require(parallel)
Find_CpG <- function(Genome, Cores){
if (class(Genome) != "BSgenome") stop("Genome must be a BSgenome!")
CpG <- mclapply(seqlevels(Genome), function(x) start(matchPattern("CG", Genome[[x]])), mc.cores = Cores)
return(
suppressWarnings(
do.call(c, mclapply(1:length(seqlevels(Genome)), function(x) GRanges(names(Genome)[x],
IRanges(CpG[[x]], width = 2)
), mc.cores=Cores))
)
)
}
## Example:
hg38.CpG <- Find_CpG(Genome = BSgenome.Hsapiens.UCSC.hg38, Cores = 8)
Login before adding your answer.
Traffic: 2585 users visited in the last hour
What format is your data in? FASTA, I presume?
If so, this is something that can be handled with a short homemade Python script, iterating through the sequence and writing instances of CG dinucleotide pairs into an output file. I'd be happy to throw something along those lines together if that's the case.
Hi aays,
Thanks for your comments. Can you please share your Python script?