CG contents for probes in Infinium HumanMethylation450 BeadChip Kit
0
0
Entering edit mode
8.2 years ago
Sha Cao • 0

Hi, I'm wondering if anyone ever calculated the CG contents for the probes in Infinium HumanMethylation450 BeadChip? Should they all be above 0.5? What I found out, however, is that none of them has CG content ratio above 0.5.

For example, the following is one cpg island probe, the CG content ratio is only 0.26 based on the fourth column (AAAACACTAACAATCTTATCCACATAAACCCTTAAATTTATCTCAAATTC). Am I doing the calculation the right way? Thank you very much for your help!!

*new edit: I think to put the question in a simpler way: is there any annotation file for HumanMethylation450 BeadChip array that could tell the CG content in each probe? The annotation file that I used for this question is obtained from: ftp://webdata2:webdata2@ussd-ftp.illumina.com/downloads/ProductFiles/HumanMethylation450/HumanMethylation450_15017482_v1-2.csv.

Thanks again!

cg00035864 cg00035864 31729416 AAAACACTAACAATCTTATCCACATAAACCCTTAAATTTATCTCAAATTC II AATCCAAAGATGATGGAGGAGTGCCCGCTCATGATGTGAAGTACCTGCTCAGCTGGAAAC[CG]AATTTGAGATAAATTCAAGGGTCTATGTGGACAAGACTGCTAGTGTCTCTCTCTGGATTG 37 Y 8553009 AGACACTAGCAGTCTTGTCCACATAGACCCTTGAATTTATCTCAAATTCG Y 8613009 F TTTY18 NR_001550 TSS1500

methylation • 3.3k views
ADD COMMENT
0
Entering edit mode

See this paper:

Additional annotation enhances potential for biologically-relevant analysis of the Illumina Infinium HumanMethylation450 BeadChip array

https://epigeneticsandchromatin.biomedcentral.com/articles/10.1186/1756-8935-6-4

For example, this paragraph:

"CpG dinucleotides are not randomly distributed throughout the genome, most have spontaneously deaminated with the exception of some CpG-enriched regions known as ‘CpG islands’ [13]. About 70% of gene promoters are associated with CpG islands [14] and traditionally gene transcription has been thought to be repressed by the presence of promoter CpG island DNAm [15, 16]. There are different approaches for classifying CpG enrichment, for example, UCSC defines CpG islands based on CG content >50%, Observed/Expected (Obs/Exp) CpG ratio >0.6 and length >200 bps [17]. An alternative classification of CpG islands providing more enrichment discrimination is high-density CpG islands (HCs, CG content >55%, Obs/Exp CpG ratio >0.75 and length >500 bps), intermediate-density CpG islands (ICs, CG content >50%, Obs/Exp CpG ratio >0.48 and length >200 bps) and non-islands (LCs or low-density CpG regions, non-HC/IC regions) [16, 18]. However, the most biologically meaningful definition of CpG enrichment remains to be determined."

and also look at Figure 7:

Figure 7

"Variation of gene feature DNAm within a CpG class. The level of DNAm was plotted as an average ß value for each gene feature in blood. Analyses were conducted within each HIL CpG class due to the large differences in DNAm that were observed between classes. Average ß values varied across probes by (A) gene location, as exemplified by intronic probes and (B) gene components, as exemplified by 5’UTR probes. 5’UTR, 5’ untranslated region; DNAm, DNA methylation; HIL, high-density CpG island (HC), intermediate-density CpG island (IC) and non-island (LC); ICshore, intermediate-density CpG island shore."

ADD REPLY
0
Entering edit mode

Thank you natasha.sernova! However, I didn't how what you said answered my question. I think my question is: is there any annotation file for HumanMethylation450 BeadChip array that could tell the CG content in each probe? Thank you! I've also edited my original question.

ADD REPLY
0
Entering edit mode

I'm not sure that you are doing anything incorrect. The genomic target sequence of the probe (given in the annotation file) can obviously be used to infer the prove sequence, which in turn can be used to infer the GC content of each probe.

I've also just looked at a R package that has the same data:

source("http://bioconductor.org/biocLite.R")
biocLite("IlluminaHumanMethylation450kprobe")
data(IlluminaHumanMethylation450kprobe)
head(IlluminaHumanMethylation450kprobe)
             Probe_ID chr strand     start       end      site probe.sequence
cg00000029 cg00000029  16      -  53468112  53468161  53468112           <NA>
cg00000108 cg00000108   3      +  37459206  37459255  37459206           <NA>
cg00000109 cg00000109   3      - 171916037 171916086 171916037           <NA>
cg00000165 cg00000165   1      +  91194626  91194675  91194674           <NA>
cg00000236 cg00000236   8      +  42263246  42263295  42263294           <NA>
cg00000289 cg00000289  14      -  69341139  69341188  69341139           <NA>
           source.sequence                           forward.genomic.sequence
cg00000029            <NA> CGAAACCTTCACACGTCAGTGTCTTTTGGACATTTTCTCGTCAGTACAGC
cg00000108            <NA> CGGCCAGGATGACAGCGGAGCCAGGATCACCCCAGGTCTGTCTCATTGCA
cg00000109            <NA> CGTATTTAGAAGCCAAGATCTGTGGGGGGGTACATGTGCCTGTTAGTATT
cg00000165            <NA> CGATGTGTGCCTCAGCTGTTCCATCAAAAGCCACTGTACTAACAGATCCT
cg00000236            <NA> CGTGATGTACAAACTGGTGGGTCAGATCGTCTCCTCTAACATGACGCTAC
cg00000289            <NA> CGACTCCCACACCAAAATGGACATGAGATTGGAGAAATGAATACAGCAGA
           CpGs
cg00000029    3
cg00000108    2
cg00000109    1
cg00000165    1
cg00000236    3
cg00000289    1

In the Manual page for this package, they give some practical examples that may be of interest.

------------------------------

Note that the UCSC also gives genome-wide GC stats, which you can download from here: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/ (gc5Base.txt.gz)

ADD REPLY
0
Entering edit mode

I think IlluminaHumanMethylation450k.db is not available anymore. I used FDb.InfiniumMethylation.hg19 instead.

ADD REPLY

Login before adding your answer.

Traffic: 1758 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6