I'm analyzing gene expression data from an experiment where I have (i) 6 different mice (A:D); (ii) 2 different time points ("pre" and "post" treatment); and (iii) 2 different response types ("responder" vs. "non-responder"). I'm trying to find genes with differential expression after treatment in the responders vs. non-responders. In other words: if I look at post therapy gene expression changes compared to 'pre', which genes are differentially expressed in responders compared to non-responders? Additionally, how can I include the 'paired' aspect into my analysis, where I might get more meaningful information by running the above analysis in a paired, rather than general, manner?
I've spent an entire day reading tutorials on designing design matrices for differential expression but I can't pin down the details, I'm hoping someone with experience might be able to help out.
A small reproducible example:
ms = factor(rep(c("A", "B", "C", "D", "E", "F"), each = 2)) # Mouse ID
rx = factor(rep(c("Pre", "Post"), 6)) # Pre or Post treatment
rp = factor(c(rep("NR", 6), rep("R", 6))) # Is the mouse a 'responder' or 'non-responder'?
data.frame(ms, rx, rp)
ms rx rp
1 A Pre NR
2 A Post NR
3 B Pre NR
4 B Post NR
5 C Pre NR
6 C Post NR
7 D Pre R
8 D Post R
9 E Pre R
10 E Post R
11 F Pre R
12 F Post R
The ideal analysis that I'd like to do is something like the following, but including the paired design, which would hopefully make the analysis more powerful:
(Post / Pre in responders) / (Post / Pre in non-responders)
The first term ("Post / Pre in responders") would find DE genes in responders, and by dividing by the second term ("Post / Pre in non-responders") I would ideally get genes which are diferentially expressed only in responders.
I tried running the "Post / Pre" analysis in responders and in non-responders separately, and manually dividing, as in the code above, but it doesn't feel right, and there must be a true way of running that with a properly designed matrix.