How to create a proper design matrix/contrast matrix for methylation 850k analysis in limma?
1
0
Entering edit mode
4.9 years ago
mario.red8976 ▴ 130

Hi to everyone, I'm totally new to bioinformatics but I'm trying to learn. Recently, I received data from an Illumina 850k analysis of DNA methylation and I'm trying to work on them using R. At the moment, I'm following the tutorial published on this paper: https://f1000research.com/articles/5-1281 (here on bioconductor, relative to the package: https://bioconductor.org/packages/release/workflows/vignettes/methylationArrayAnalysis/inst/doc/methylationArrayAnalysis.html). The problem is that I got stuck on the design matrix and contrast matrix part to perform the statistical analysis. I'll try to explain the experiment. My sample is composed of the same cell line with 4 different conditions (each in biological triplicate): a gene-upregulation through doxycycline induction, so I have samples induced (IN) and non induced (NI); a gene silencing through si-RNA, so I have samples with si-Scramble (S) and with si-RNA to downregulate my target (T).

My four conditions are:
1. S - NI,

2. S - IN,

3. T - NI,

4. T - IN (then I have the triplicates, so 1-2-3 each).

I want to check for differences in the replicates but most important I want to check all the differences I have respect to my Control that is S - NI (scramble - non induced), but also between all the conditions to check for the role of the induced gene and the downregulated one. I'm not confident with this type of statistics so I can't understand how I have to set the design matrix and the contrast matrix to proceed. Sorry for being so long but I wanted to explain as much as I can. If someone can help me it will be really great; I'm also reading the guide for limma but still I'm not sure of what I have to do. Thank you in advance for your answers!! ps I can post also some code if it can help

850k methylation analysis R limma design matrix • 2.7k views
ADD COMMENT
0
Entering edit mode

Thank you for your answer! Probably I have explained the experiment in the wrong way, I have four samples like this: Not Induced : si-Ctrl , si-Gene Induced : si-Ctrl , si-Gene. In this way, I have also a sample in which the effects of the two conditions are combined. But I think that more or less the design matrix will be similar to what you wrote, right?

Following the tutorial, I tried something similar to this:

# this is the factor of interest
cellType <- factor(targets$Sample_Group) # si-Ctrl, si-Gene, Ind+si-Ctrl, Ind+si-Gene
# this is the replicate effect that we need to account for
replicate <- factor(targets$Sample_Source) # R1 , R2, R3

# use this code to create a design matrix
design <- model.matrix(~0+cellType+replicate, data=targets)
colnames(design) <- c(levels(cellType),levels(replicate)[-1])   # why we use replicate[-1]? I lose R1

# create a contrast matrix for specific comparisons
    contMatrix <- makeContrasts(Ctrl-Gene,
                                Ctrl-Ind Ctrl,
                                Ctrl-Ind Gene,
                                Gene-Ind Ctrl,
                                Gene-Ind Gene,
                                Ind Ctrl-Ind Gene,
                                levels=design)   # here I wrote every comparison, is it right???
ADD REPLY
0
Entering edit mode

I've updated my answer.

ADD REPLY
0
Entering edit mode

I will try, thank you very much!! 😁

ADD REPLY
0
Entering edit mode
4.9 years ago

So as far as I can tell, you design looks something like:

Sample    Induction    siRNA      Replicate
1         True         siControl  R1
2         True         siControl  R2    
3         True         siControl  R3
4         True         siGene     R1
5         True         siGene     R2
6         True         siGene     R3
7         False        siControl  R1   
8         False        siControl  R2  
9         False        siControl  R3  
10        False        siGene     R1
11        False        siGene     R2
12        False        siGene     R3

Lets imagine thats what your sample_info dataframe looks like.

Here is how I would do it.

First you want to set inductionFalse and siControl as the reference levels, so that things are compared against them.

sample_info$Induction <- factor(sample_info$Induction, levels = c("False", "True"))
sample_info$siRNA <- factor(sample_info$siRNA, levels = c("siControl", "siGene"))

Then your design matrix should be:

design <- model.matrix(~Induction + siRNA + siRNA:Induction, data=sample_info)

This should give you a design matrix with the coefficients (intercept), InductionTrue, siRNAsiGene and InductionTrue:siRNAsiGene.

You then don't need any contrasts. The InductionTrue coefficient will tell you the effect of the induction. The siRNAsiGene coefficient will tell you the effect of the knockdown, and the InductionTrue:siRNAsiGene coefficient will tell you how the effect of the knockdown is different in the induced to the non-induced situation (or, equivalently, how the effect of the induction is different in the knockdown to the non-knockdown situation).

ADD COMMENT

Login before adding your answer.

Traffic: 1775 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6