edger model matrix design of linear model
1
2
Entering edit mode
6.5 years ago
druggable ▴ 60

Hi,

I am trying to do differential expression based on RNA-seq data. I only have one variable, which is the timepoint. I want to analyze gene expression relative to the 0 h time point. I already set up the contrasts. Now, I am trying to set up the design matrix. What is the difference between

design <- model.matrix(~timepoint)

and

design <- model.matrix(~0 + timepoint).

There is a big difference in the results.

edger design matrix differential expression • 6.8k views
ADD COMMENT
0
Entering edit mode

I don't see a difference. I could be wrong.

Why not try both and compare the results?

ADD REPLY
0
Entering edit mode

Hi, there is a big difference in the results. The results using the design <- model.matrix(~timepoint) makes much more biological sense.

ADD REPLY
0
Entering edit mode

Yes I don't think model.matrix(~0 + timepoint) makes much sense. How can you use zero as a term in your model to explain differences?

ADD REPLY
10
Entering edit mode
6.5 years ago

The clue here is to look at the coefficients, or columns derived in the design matrix. Your non intercept model design, will create a coefficient that describes the difference between the two levels of your timepoint variable. Your non-intercept model will create a coefficient for each level. Here's an example using your non-intercept model:

library(tidyverse)
library(limma)
# Make a Phenotype Table
foo       <- data.frame(SampleType = paste0(rep(c("A","B"), each = 3)), 
                        Reps       = rep(1:3, 2)) %>%
             mutate(Names = paste0(SampleType,Reps))
# Make some fake gene expression data
genes.foo <- matrix(rnorm(6*100),ncol = 6) %>% 
             `rownames<-`(paste0("Gene_",1:100)) %>% 
             `colnames<-`(foo$Names)

#non-intercept model
model.foo <- model.matrix(~0 + SampleType, data = foo) %>% 
             `colnames<-`(gsub("SampleType","",colnames(.)))

model.foo
            A           B
1           1           0
2           1           0
3           1           0
4           0           1
5           0           1
6           0           1


conts.foo      <- c("A_Vs_B" = "A-B")
contmap.foo    <- makeContrasts(contrasts = conts.foo, levels = colnames(model.foo)) %>% 
                  `colnames<-`(names(conts.foo))
fit.foo     <- lmFit(genes.foo, model.foo) %>% 
               contrasts.fit(contmap.foo) %>% 
               eBayes


> fit.foo$coefficients %>% head
        Contrasts
             A_Vs_B
  Gene_1 -0.6722254
  Gene_2 -1.1571439
  Gene_3 -0.1021150
  Gene_4  0.8687436
  Gene_5 -0.9600460
  Gene_6  1.3139228

topTable(fit.foo, coef = "A_Vs_B", number = Inf, p.value = 0.05, lfc = log2(1.5))

Alternatively, here's your intercept model

model.foo <- model.matrix(~SampleType, data = foo)

fit.foo     <- lmFit(genes.foo, model.foo) %>% 
               eBayes

> model.foo
  (Intercept) SampleTypeB
1           1           0
2           1           0
3           1           0
4           1           1
5           1           1
6           1           1

# Here SampleTypeB is actually the difference relative to the first level (SampleTypeA)
> fit.foo$coefficients %>% head
       (Intercept) SampleTypeB
Gene_1  -0.2920748   0.6722254
Gene_2  -1.0356549   1.1571439
Gene_3   0.2477702   0.1021150
Gene_4   0.5016585  -0.8687436
Gene_5   0.3153554   0.9600460
Gene_6   0.2836491  -1.3139228

topTable(fit.foo, coef = "SampleTypeB", number = Inf, p.value = 0.05, lfc = log2(1.5))

edit: Changed the data frame initialisation.

ADD COMMENT
0
Entering edit mode

in line 4, it should be: foo <- data.frame(SampleType = paste0(rep(c("A","B"), each = 3)), Reps = rep(1:3, 2), Names = paste0(rep(c("A","B"), each=3), rep(1:3, 2) ) )

set.seed() before generating the data will be helpful.

ADD REPLY
0
Entering edit mode

You're right, the data frame initialisation was wrong. I mostly wrote this as pseudocode and ran bits in R...I probably should have written that as a disclaimer. Setting the seed has benefits in reproducibility, but as it's proof of concept, and dummy data, it doesn't really matter here.

ADD REPLY

Login before adding your answer.

Traffic: 1961 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6