CpG islands were predicted by searching the sequence one base at a time, scoring each dinucleotide (+17 for CG and -1 for others) and identifying maximally scoring segments. Each segment was then evaluated for the following criteria:
GC content of 50% or greater, length greater than 200 bp, ratio greater than 0.6 of observed number of CG dinucleotides to the expected number on the basis of the number of Gs and Cs in the segment.
The CpG count is the number of CG dinucleotides in the island. The Percentage CpG is the ratio of CpG nucleotide bases (twice the CpG count) to the length. The ratio of observed to expected CpG is calculated according to the formula (cited in Gardiner-Garden et al. (1987)):
Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G) where N = length of sequence.
EMBOSS has a number of CpG related tools (see B.6.16. Applications in group Nucleic:cpg islands). The two CpG island finders: cpgplot and newcpgreport use slightly different criteria for calling a CpG island.