Question

How to create a design matrix for Cpg.annotate?

0

Entering edit mode

7.7 years ago

c.ryder3 ▴ 40

> head(ICGC_2)
              naive.1   memoryCS.1   naive.2   memoryCS.2   naive.3  memoryCS.3
cg00000029  0.6199970   0.5703951  0.6383819   0.5831206  0.7012571  0.6000816
cg00000108  0.9083578   0.9105157  0.9030611   0.9103147  0.9115842  0.8947593
cg00000109  0.8694214   0.7525098  0.8478160   0.7725212  0.8645145  0.7636347
cg00000165  0.1911901   0.3050081  0.1810569   0.3750369  0.2250429  0.3094155
cg00000236  0.8666489   0.8382011  0.8586420   0.8369283  0.8860430  0.8439371
cg00000289  0.6653662   0.5512665  0.5815338   0.4773868  0.6254710  0.5408634

Above is a snippet of a data frame I have in R that contains 450K methylation beta values for 6 samples, 3 of which are from naive B cells and 2 of which are from memory class-switched B cells.

I would eventually like to identify differentially methylated genomic regions in the naive samples compared to the memoryCS samples using the Bioconductor package DMRcate.

However, I'm stuck on creating a design matrix for Cpg.annotate. I've tried following the workflow available here... https://www.bioconductor.org/help/workflows/methylationArrayAnalysis/ ...but this doesn't explain too well how exactly to go about creating a design matrix.

Can anyone explain how I can go about creating a design matrix that will allow me to compare the naive and memoryCS samples?

Thank you

R Bioconductor DMRcate cpg.annotate • 2.6k views

ADD COMMENT • link updated 7.7 years ago by e.rempel ★ 1.1k • written 7.7 years ago by c.ryder3 ▴ 40

score 3 · Accepted Answer · 2017-08-19

I will provide a short answer here, but I would strongly recommend you to read more about statistics and linear models, in particular.

In mentioned manual, the authors are using the factors of interest to create the design matrix with function model.matrix. In your case, the factor of interest is the type of cells: naive or memoryCS. So you can create your design matrix like that:

type_cells <- factor(rep(c("naive","memoryCS"),3), levels = c("naive","memoryCS")) 
design <- limma::model.matrix(~0 + type_cells)
colnames(design) <- c("naive","memoryCS")

Then you can fit a linear model to your data:

fit <- lmFit(ICGC_2, design)

Then you could proceed with the manual.

HTH