Dear all,
I am working on a large DNA methylation dataset (Illumina EPIC array) that comprises 1300 samples from about 900 individuals, with longitudinal measures for 400 individuals. For a differential methylation analysis in a specific subset of this sample (n=600, cross-sectional) I am now wondering which workflow would make more sense:
a) Perform preprocessing and quantile normalization on full dataset, subset afterwards
b) Subset raw data first, perform preprocessing and quantile normalization on subset
For the normalization, minfi::preprocessQuantile() is used.
Is there a recommended way to do this? Thanks for any help!
In my opinion, it is best practice to normalize all the samples you have together and in differential analysis to use contrasts.