Hi, I'm wondering if anyone ever calculated the CG contents for the probes in Infinium HumanMethylation450 BeadChip? Should they all be above 0.5? What I found out, however, is that none of them has CG content ratio above 0.5.
For example, the following is one cpg island probe, the CG content ratio is only 0.26 based on the fourth column (AAAACACTAACAATCTTATCCACATAAACCCTTAAATTTATCTCAAATTC). Am I doing the calculation the right way? Thank you very much for your help!!
*new edit: I think to put the question in a simpler way: is there any annotation file for HumanMethylation450 BeadChip array that could tell the CG content in each probe? The annotation file that I used for this question is obtained from: ftp://webdata2:webdata2@ussd-ftp.illumina.com/downloads/ProductFiles/HumanMethylation450/HumanMethylation450_15017482_v1-2.csv.
Thanks again!
cg00035864 cg00035864 31729416 AAAACACTAACAATCTTATCCACATAAACCCTTAAATTTATCTCAAATTC II AATCCAAAGATGATGGAGGAGTGCCCGCTCATGATGTGAAGTACCTGCTCAGCTGGAAAC[CG]AATTTGAGATAAATTCAAGGGTCTATGTGGACAAGACTGCTAGTGTCTCTCTCTGGATTG 37 Y 8553009 AGACACTAGCAGTCTTGTCCACATAGACCCTTGAATTTATCTCAAATTCG Y 8613009 F TTTY18 NR_001550 TSS1500
See this paper:
Additional annotation enhances potential for biologically-relevant analysis of the Illumina Infinium HumanMethylation450 BeadChip array
https://epigeneticsandchromatin.biomedcentral.com/articles/10.1186/1756-8935-6-4
For example, this paragraph:
"CpG dinucleotides are not randomly distributed throughout the genome, most have spontaneously deaminated with the exception of some CpG-enriched regions known as ‘CpG islands’ [13]. About 70% of gene promoters are associated with CpG islands [14] and traditionally gene transcription has been thought to be repressed by the presence of promoter CpG island DNAm [15, 16]. There are different approaches for classifying CpG enrichment, for example, UCSC defines CpG islands based on CG content >50%, Observed/Expected (Obs/Exp) CpG ratio >0.6 and length >200 bps [17]. An alternative classification of CpG islands providing more enrichment discrimination is high-density CpG islands (HCs, CG content >55%, Obs/Exp CpG ratio >0.75 and length >500 bps), intermediate-density CpG islands (ICs, CG content >50%, Obs/Exp CpG ratio >0.48 and length >200 bps) and non-islands (LCs or low-density CpG regions, non-HC/IC regions) [16, 18]. However, the most biologically meaningful definition of CpG enrichment remains to be determined."
and also look at Figure 7:
Figure 7
"Variation of gene feature DNAm within a CpG class. The level of DNAm was plotted as an average ß value for each gene feature in blood. Analyses were conducted within each HIL CpG class due to the large differences in DNAm that were observed between classes. Average ß values varied across probes by (A) gene location, as exemplified by intronic probes and (B) gene components, as exemplified by 5’UTR probes. 5’UTR, 5’ untranslated region; DNAm, DNA methylation; HIL, high-density CpG island (HC), intermediate-density CpG island (IC) and non-island (LC); ICshore, intermediate-density CpG island shore."
Thank you natasha.sernova! However, I didn't how what you said answered my question. I think my question is: is there any annotation file for HumanMethylation450 BeadChip array that could tell the CG content in each probe? Thank you! I've also edited my original question.
I'm not sure that you are doing anything incorrect. The genomic target sequence of the probe (given in the annotation file) can obviously be used to infer the prove sequence, which in turn can be used to infer the GC content of each probe.
I've also just looked at a R package that has the same data:
In the Manual page for this package, they give some practical examples that may be of interest.
------------------------------
Note that the UCSC also gives genome-wide GC stats, which you can download from here: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/ (gc5Base.txt.gz)
I think IlluminaHumanMethylation450k.db is not available anymore. I used FDb.InfiniumMethylation.hg19 instead.