I am analyzing data (from the TCGA project) of patients affected by Glioblastoma Multiforme and, specifically, I want to compare Gene Expression values with Methylation levels.
Methylation levels have been obtained using Illumina Infinium HumanMethylation27 BeadChip, of which I downloaded the product support file*, that retrieves methylation levels of ~27k CpG sites.
Here comes the issue: for a lot of genes there are several probes (hence, CpG sites) that regulates the same gene. I was wondering what could be the best way to treat them as a unique entity, so to obtain a single methylation level for each gene.
I was thinking of taking the average of all the probes that control one specific gene but the assumption here is "all CpGs have the same importance as gene expression regulators" and I don't know if I can justify it.
* https://support.illumina.com/array/array_kits/infinium_humanmethylation27_beadchip_kit.ilmn
** http://support.illumina.com/downloads/humanmethylation27_product_support_files.ilmn
This is an excellent question. How to summarise methylation probes to gene level is an issue that is routinely ignored or glossed over in publications on this topic. I call it the 'genes x samples' problem, because statistics papers always talk about "matrices of genes x samples" with no indication of how they were derived.
Thanks, though not a definitive answer it provides very useful insight. I will proceed taking one probe per gene.
Hey, there! Do you have find any method to do this jod? Recently, I also met the same problem. Thanks a lot! Wayne
As suggested in Neilfws's comment I decided to choose the probe with the highest variance.