Hi everyone,
I have several DNA methylation microarray data from Illumina 450K and EPIC (850K).
I would like to combine all of these to perform meta-analysis. I have raw data for all of them.
I have perused the community for ways to combining array data from different platforms (and there are a lot of them), and the only way to do this is to basically keep only the CpGs that are commonly present in both platforms, thereby ignoring about half of the 850K probes and about 9% of the 450K probes. (how to combine methylation information from different platform? and Methylation normalization between samples from different platform).
There are some packages mentioned such as
- methylumi for combining 450K and 27K data (Tcga Illumina Methylation Combining 27K And 450K). This only looks at the common probes between 27K and 450K.
- methylLiftover which can be used to map RRBS or WGBS data to 450K or 850K data, but again, they are dropping A LOT of data from the whole genome techniques and capture only the common CpGs.
-RnBeads(https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1664-9) This package supports cross-platform data integration and analysis, but only looks at commonly present probes (combining 450K, 850K, WGBS data results in 400,000 shared probes).
What I would like ask is the reason behind why we cannot simply add up the number of methylated and unmethylated counts from each dataset (Sorry I am bad at statistics).
With my simple mind, I would assume some sort of intra-normalization or intra-filtering is performed for each dataset, and for each CpG sites, the reads can be just summed up. Doing this will result in the commonly present probes with more read depth, and the unique probes processed just like it would normally be. What is preventing us from doing this and using the resulting data?