Question

Statistics of methylation experiments

0

Entering edit mode

9 months ago

ramiro.barrantes ▴ 60

I am new to methylation experiments and I am currently working on a project with 32 samples that used an Epic Infinium array. About 15 samples had a treatment and the rest without a treatment. Some samples might need to be removed due to very low quality.

In addition, if we do a differential methylation model, we identify probes that could be statistically significant at 0.01 for example, yet I am not sure if that is low enough

My background is in statistics and I am trying to understand things such as:

1) is 32 samples enough? too many? too little? 2) what are the implications of removing a sample? 3) what p-values do people use in this circumstances to identify differentially methylated probes.

Do you know where I can understand general statistical questions such as these? I am looking in the literature and found some good literature (e.g. this) but I am looking for more, so any advice appreciated.

methylation • 883 views

ADD COMMENT • link updated 9 months ago by yura.grabovska ▴ 780 • written 9 months ago by ramiro.barrantes ▴ 60

score 1 · Answer 1 · 2024-09-09

The paper you linked is a good start. Here are a few other reviews about DNA methylation measurements, experimental designs, and statistical analysis that might be helpful:

Statistical methods for detecting differentially methylated loci and regions
Power and sample size estimation for epigenome-wide association scans to detect differential DNA methylation
Removing unwanted variation in a differential methylation analysis of Illumina HumanMethylation450 array data

The Illumina 450K and EPIC (aka 850K) arrays are very similar, technologically, but the EPIC arrays cover more CpGs in the human genome.

score 1 · Answer 2 · 2024-09-10

How are you running the differential methyaltion analysis - minfi/ChAMP/something else? Something like minfi::dmpFinder() in R will report q-values or you can correct the p-values yourself. In terms of differential methylation you can look at tools like DMRcate which do differential methylated regions rather than single probles which helps to capture regions such as CpG islands.

Ususually when I look at differential methylation it helps to also gauge the size of difference in beta-value between target and control. You can calculate group-wise beta-value medians between your categories and as an additional filter you can threshold on the size of the difference much like you would with LFC with expression experiments.

32 samples across 2 groups sounds like a decent enough sample size for a basic differential experiment using 450K/EPIC. I don't think you could ever have "too many" cases in any kind of differential experiment unless the groups are unevenly sized - you wouldn't want 28 and 4 cases in the two groups, to give a silly example.

But I wouldn't worry about removing samples provided you have a good systematic rationalle for doing so such as a high probe failure rate/poor bisulfite conversion etc. A good first sanity check would be to run something like a PCA and make sure all the samples in a group look good and you're not getting batch effects or some other confounding technical variability.

Methylation probes on the Illumina arrays are usually pretty well annotated, including information for the type of probe (type 1 or 2 - there are tools for correcting for this if you feel it's necessary) and the methylated site (gene body, promoter, open sea etc). Also be aware that not all the probes on the array are assaying methylation - some are chromosome marker probes (these don't typically start with "cg") others are on top of or very close to a known SNP in the population DMRcate::rmSNPandCH() will help with controlling for these.