For GEO data, how to calculate the average methylation level of the gene promoter region using R.
1
0
Entering edit mode
5.7 years ago
a511512345 ▴ 190

Hello, guys I am currently learning to use the TCGA-Assembler package to process methylated chips from TCGA. TCGA-Assembler provides a function (CalculateSingleValueMethylationData) to calculate the average methylation level of a particular region of a gene.

However, for GEO data, how to calculate the average methylation level of the gene promoter region using R. I look forward to your reply. thank you very much

GEO Methylation • 2.3k views
ADD COMMENT
3
Entering edit mode
5.7 years ago

There is no standard package for this, of course. It will take some effort on your part.

Some pointers:

  1. download the normalised β (beta) methylation values from GEO. You will know that they are β values because the distribution will go from 0.0 to 1.0. Most methylation data on GEO in the series matrix files should be normalised. There is usually an automated R script that you can use, too. From the main accession page, click on the blue Analyze with GEO2R button
  2. download promoter regions as a BED file - you will have to define what is a promoter in your study. Generally, there is no clear definition of what is a promoter, but activity of H3K27ac, H3K4me1, and H3K27me3 are observed at promoters (and enhancers). You can download information for these from the ChromHMM study (do a search). My preference, however, would be to take the data from FANTOM5, a study from Japan whose aim was to define promoter regions.
  3. summarise methylation by mean across your promoter regions. For this in R, you can use GenomicRanges

Kevin

Edit based on noorpratap's comment: it is highly likely that the methylation array already has many probes that target promoter regions. Thus, why not just use these? Check the array platform and then try to obtain the associated annotation / metadata associated with this.

ADD COMMENT
1
Entering edit mode

Thank you for your help, I will try it.

ADD REPLY
2
Entering edit mode

I am not familiar with GEO but data should contain probe level information. The genomic locations of the probes can be extracted from Illumina450K annotation given the data has been retrieved by that.Illumina Manifest. Once you have that then the paper outlines a method for associating beta value to a gene in which if the probes are present within TSS200 then the mean of all those probes is used, otherwise mean of probes in 1st Exon is taken and if 1stExon is also not there then mean of probes present in TSS1500 is used.

ADD REPLY
1
Entering edit mode

Thanks for the additional information, noorpratap. It reminds that, in fact, the Illumina 450k methylation metadata indicates whether or not the probe is in a promoter region ('promoter' as defined by Illumina).

a511512345, you may simply want to check whether you already have information on the probes overlapping the promoter regions. Take a look at the Illumina Manifest to which noorpratap refers

ADD REPLY

Login before adding your answer.

Traffic: 1871 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6