Entering edit mode
9.5 years ago
Ezhil La
▴
40
Hi,
I am working on 450K BeadChip methylation arrays. There are multiple CpG sites for a gene and I would like to know better ways of collapsing multiple probes into a single one representing a gene. Could you please suggest some methods and also the software for doing this step?
In gene expression arrays, I normally select a probe with high variance (or average some times) to represent a gene.
Thanks in advance.
Kind regards,
Ezhil
A gene-level expression estimate makes some sense, a gene-level methylation metric less so. A better question to ask yourself is whether you really do want to summarize over whole genes (hint: you probably don't).
Probably not but I am not sure that it is a correct way. I thought of averaging all probes within 200KB of transcription start site (TSS) to represent a gene-level methylation. Obviously 200KB is arbitrary and also the assumption of something close to TSS is very important than other gene-regions made me to look for alternate ways.
Why not to:
for each gene select probe with highest variance
Here's what I do, in a nutshell: for each methylation site, link it to it's nearest neighbouring gene and to it's 2 nearest methylation sites in a cytoscape network; import the site-specific methylation p-values, run jActiveModules. However, I disagree that a gene-level summary of the methylation data is of no biological varlue (and I certainly wouldn't cherry pick the highest variance probe, no reason to introduce a bias for no reason)