Let's suppose your various phenotypic factors are stored in a data.frame
called treatments
library(magrittr) # %>% piping
treatments <- data.frame(
pheno = rep(c('Ctrl', 'Ctrl', 'Ctrl', 'Disease', 'Disease'), each = 4),
cell = rep(paste('A', 1:4, sep = ''), times = 5),
pt = rep(paste('ID', 1:5, sep = ''), each = 4)
)
treatments %>% head
# pheno cell pt
# 1 Ctrl A1 ID1
# 2 Ctrl A2 ID1
# ...
# 6 Ctrl A2 ID2 .....
I find it easier to separate out the cellular / disease parts from the patient-matching part when building designs for this type of study (I've done them quite a lot for leukaemia datasets with various cell subpopulations over the last 18 months). So we define
# All combns: disease state and cell subpop:
# - ie, effect of disease is not assumed to be constant in each different cell type
pheno.design <- model.matrix(~ cell * pheno, data = treatments)
# No Intercept term in patient.design,
# just an indicator column for each person
patient.design <- model.matrix(~ -1 + pt, data = treatments)
> pheno.design %>% head(3)
# (Intercept) cellA2 cellA3 cellA4 phenoDisease cellA2:phenoDisease
#1 1 0 0 0 0 0
#2 1 1 0 0 0 0
#3 1 0 1 0 0 0
# cellA3:phenoDisease cellA4:phenoDisease
#1 0 0
#2 0 0
#3 0 0 ...
> patient.design %>% head
# ptID1 ptID2 ptID3 ptID4 ptID5
#1 1 0 0 0 0
#2 1 0 0 0 0
# ...
#6 0 1 0 0 0 ...
Now, you can use the following as your design matrix in limma, if you want to then struggle with setting up your contrast matrix:
binary.design <- cbind(pheno.design, patient.design[, c(2,3,5)])
colnames(binary.design) <- c("Intercept", make.names(colnames(binary.design))[-1])
Why will your contrasts matrix be ugly if you use this design?
Let's suppose you're interested in comparing disease vs control in cell type A2.
Algebraically, the contrast you want is:
(fitted val for cell type A2, Dis) - (fitted val for cell type A2, Ctrl)
The design takes [cell type A1, control status, patient ID1] as baseline (the value that is fitted when only the intercept term is included). If you include the coefficient for 'phenoDisease' as well, you will fit the disease-baseline sample [cell type A1, disease status, patient ID4].
To get the fitted value for [cell type A2, disease status] we compute the mean of the fitted values for the three different control individuals:
[Fitted val: A2, Control, ID1] = Intercept + cellA2;
[A2, Ctrl, ID2] = Intercept + cellA2 + ptID2
[A2, Ctrl, ID3] = Intercept + cellA2 + ptID3
=> [A2, Ctrl] = Intercept + cellA2 + (ptID2 + ptID3)/3
Similarly, for cell type A2 in the disease samples:
=> [A2, Disease] = Intercept + cellA2 + phenoDisease + cellA2.phenoDisease + ptID5 / 2
So the contrast you want to define is
[Contrast: Dis vs Ctrl in A2] =
[A2, Disease] - [A2, Control] =
phenoDisease + cellA2.phenoDisease + (ptID5 / 2) - (ptID2 + ptID3) / 3
You don't see these kinds of ugly contrasts for binary model matrices in studies where there's inter-treatment matching (say you had three patients, 4 cell subpops from each and treated the 4 cellpops +/- drug giving 24 samples; the ptID coefficients would cancel out in the definition of your contrast).
You might find it neater to encode the patient matrix as orthogonal differences between the patients in each experimental arm. This removes the ptID terms from the final contrast matrix (they are put inside the model matrix, rather than in the contrast matrix; and is much easier to explain to the bench)
pt.delta.matrix <- patient.design %*% matrix(
c(c(1, -1/2, -1/2, 0, 0), # diff ID1 vs mean of ID2/3
c(0, 1, -1, 0, 0), # diff ID2 vs ID3
c(0, 0, 0, 1, -1)), # diff ID4 vs ID5
ncol = 3,
dimnames = list(c(), c("pt.delta1.23", "pt.delta2.3", "pt.delta4.5"))
)
> pt.delta.matrix
# pt.delta1.23 pt.delta2.3 pt.delta4.5
#1 1.0 0 0
#2 1.0 0 0
# ...
#6 -0.5 1 0
# ...
delta.design <- cbind(pheno.design, pt.delta.matrix)
colnames(delta.design) <- c("Intercept", make.names(colnames(delta.design))[-1])
The contrast I tried to explain above then comes out as:
[Contrast: Dis vs Ctrl in A2] =
[A2, Disease] - [A2, Control] =
phenoDisease + cellA2.phenoDisease
Apologies for the long-winded explanation.
Thanks for the great answer and time you put into this response! I have one question, can I still use this if patients have a systemic disease, so 4 cell types and all disease vs 4 cell types and all control? Again, thanks for all the effort it helped me a ton!
Yeah you could do that. If I've understood correctly, and you've set up the design as in delta.design, I think your contrast would be phenoDisease + (cellA2.phenoDisease + cellA3.phenoDisease + cellA4.phenoDisease)/4.