I have done a significant amount of reading over the past few days and am still uncertain about proper model matrix design for paired sample analysis. I am starting with a limma/voom object v
that I would like to assess for transcriptional differences via RNA-seq between disease subtypes and their respective matched normal tissues. Thus I would like to model the effects of both Tissue
and Subject
.
Regarding the model matrix, I would assume to use design <- model.matrix(~0+Subject+Tissue)
. My understanding is that this would consider an intercept for Tissue
, where Normal is always the reference, but there would be no reference Subject
(due to no intercept for subject).
Note this approach elicits slightly different results than design <- model.matrix(~0+Tissue+Subject)
. The other alternative I have considered is design <- model.matrix(~Tissue+Subject)
which considers an intercept for both Tissue
and Subject
. However, I don't want a reference Subject
, so I think this design is not appropriate.
Here's my workflow in R:
# consider limma/voom object v
# make paired design matrix
exp.design <- data.frame(c("S1.Disease1", "S1.Normal",
"S2.Disease1", "S2.Normal",
"S3.Disease1", "S3.Disease2", "S3.Normal",
"S4.Disease1", "S4.Disease2", "S4.Normal",
"S5.Disease1", "S5.Disease2", "S5.Normal",
"S6.Disease1", "S6.Normal",
"S7.Disease1", "S7.Normal",
"S8.Disease1", "S8.Normal",
"S9.Disease1", "S9.Normal"))
colnames(exp.design) <- "Sample"
exp.design$subject <- c(rep("S1", 2),
rep("S2", 2),
rep("S3", 3),
rep("S4", 3),
rep("S5", 3),
rep("S6", 2),
rep("S7", 2),
rep("S8", 2),
rep("S9", 2))
exp.design$tissue <- c('Disease.subtype1', 'Normal', # S1
'Disease.subtype1', 'Normal', # S2
'Disease.subtype3', 'Disease.subtype1', 'Normal', # S3
'Disease.subtype3', 'Disease.subtype3', 'Normal', # S4
'Disease.subtype1', 'Disease.subtype1', 'Normal', # S5
'Disease.subtype2', 'Normal', # S6
'Disease.subtype2', 'Normal', # S7
'Disease.subtype1', 'Normal', # S8
'Disease.subtype1', 'Normal' # S9
)
Subject <- factor(exp.design$subject)
Tissue <- factor(exp.design$tissue, levels=c("Normal", "Disease.subtype1", "Disease.subtype2", "Disease.subtype3"))
# design model matrix
design <- model.matrix(~0+Subject+Tissue)
v$design <- design
# contrast matrix
contr.matrix <- makeContrasts(DiseaseSubtype1vsNormal = TissueDisease.subtype1,
DiseaseSubtype2vsNormal = TissueDisease.subtype2,
DiseaseSubtype3vsNormal = TissueDisease.subtype3,
levels = colnames(design))
# fit linear models etc.
vfit <- lmFit(v, v$design)
vfit <- contrasts.fit(vfit, contrasts=contr.matrix)
efit <- eBayes(vfit)
# prepare DGE lists
DiesaseSubtype1.vs.Normal <- topTable(efit, coef=1, n=Inf)
DiesaseSubtype2.vs.Normal <- topTable(efit, coef=2, n=Inf)
DiesaseSubtype3.vs.Normal <- topTable(efit, coef=3, n=Inf)
# subset by FDR < 0.05 genes
DiesaseSubtype1.vs.Normal.sig <- subset(DiesaseSubtype1.vs.Normal, adj.P.Val < 0.05)
DiesaseSubtype2.vs.Normal.sig <- subset(DiesaseSubtype2.vs.Normal, adj.P.Val < 0.05)
DiesaseSubtype3.vs.Normal.sig <- subset(DiesaseSubtype3.vs.Normal, adj.P.Val < 0.05)
I would much appreciate any insight. Thanks!
Hi. I need to perform DGE between paired samples overtine. Did you reach a conclusion about the best model matrix?
design <- model.matrix(~0+Tissue+Subject)
ordesign <- model.matrix(~Tissue+Subject)
? I checked the Section 3.4.1 (Paired samples) of the edgeR user's guide package and it indicates that we I should usedesign <- model.matrix(~Tissue+Subject)
for paired sampes, but I don't know if is the best model to perform DGE between paired samples overtine.This question/answer on biostars might be helpful.