How to get the coordinations of CpG sites in non-human genome
2
1
Entering edit mode
6.7 years ago
Joe ▴ 40

Hi there,

I want to predict the CpG sites' exact position (not for CpG islands) in non-human genome. Anyone can help?

Thanks! Joe

genome Methylation non-human genome • 3.5k views
ADD COMMENT
0
Entering edit mode

What format is your data in? FASTA, I presume?

If so, this is something that can be handled with a short homemade Python script, iterating through the sequence and writing instances of CG dinucleotide pairs into an output file. I'd be happy to throw something along those lines together if that's the case.

ADD REPLY
0
Entering edit mode

Hi aays,

Thanks for your comments. Can you please share your Python script?

ADD REPLY
2
Entering edit mode
6.7 years ago

This program I wrote fastaRegexFinder could help you. You could get the positions of CpGs in bed format with something like:

fastaRegexFinder.py -f genome.fa -r CG --noreverse > CpG.bed

But yes, finding CpG is quite easy in case you want to give it go writing your own script.

ADD COMMENT
0
Entering edit mode
6.7 years ago
ATpoint 85k

A solution in R that can get the CpG coordinates from any given BSgenome.

library(BSgenome.Hsapiens.UCSC.hg38) 
require(Biostrings)
require(parallel)

Find_CpG <- function(Genome, Cores){
  if (class(Genome) != "BSgenome") stop("Genome must be a BSgenome!")

  CpG <- mclapply(seqlevels(Genome), function(x) start(matchPattern("CG", Genome[[x]])), mc.cores = Cores)
  return(
    suppressWarnings(
      do.call(c, mclapply(1:length(seqlevels(Genome)), function(x) GRanges(names(Genome)[x], 
                                                                           IRanges(CpG[[x]], width = 2)
                                                                           ), mc.cores=Cores))
    )
  )
}

## Example:
hg38.CpG <- Find_CpG(Genome = BSgenome.Hsapiens.UCSC.hg38, Cores = 8)
ADD COMMENT

Login before adding your answer.

Traffic: 2066 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6