I am thinking to do several differential expression analysis on the same data about the progression of a disease but changing the design of the linear model. I would like to compare different phases of a disease with the normal/control data. Each comparisons statistics will be used for a GSEA. The most similar approach I have found is the time course in the limma vignette, section 9.6 of the current version.
Approach 1
Adjusting for each phase and the controls:
Sample | Normal | Phase 1
A | 1 | 0
B | 1 | 0
C | 0 | 1
D | 0 | 1
And the next phase independently (with another call to lmFit
) as:
Sample | Normal | Phase 2
A | 1 | 0
B | 1 | 0
E | 0 | 1
F | 0 | 1
and so on.
This approach uses 1 model for each phase which doesn't take into account the sequential progression from phase 1 to phase n.
Approach 2
Adjusting for all phases and the controls, and then create contrast levels of the normal vs each phase
Sample | Normal | Phase 1 | Phase 2
A | 1 | 0 | 0
B | 1 | 0 | 0
C | 0 | 1 | 0
D | 0 | 1 | 0
E | 0 | 0 | 1
F | 0 | 0 | 1
This approach uses 1 model and several contrasts to adjust the expression for each phase.
Question
If I understood correctly the limma vignette and the lmFit function the logFC of each gene will be different as well as the moderate t-statistic for each contrast. How similar/different will be the t-statistics for the same comparison in the different approach? Would it be redundant to make both approaches?
IMO Approach 1 is pointless. You can address the same contrasts with greater power in Approach 2 (the fold-changes will be very similar in approach2 vs approach1; but the variability estimates should be more precise in approach 2). However, neither of the approaches that you have defined take account of the sequential progression from normal to phase1 to phase2.
Thanks for the feedback. How should I then consider that it is sequential? Just considering ~Phase Including there the controls state? Or any other way ?
Personally I'd do the following contrasts: i) (Any-phase) vs Normal; ii) Phase2 vs phase1. That is, contrastA = (Phase1 + Phase2)/2 - Normal, contrastB = (Phase1 - Phase2) using the design in Approach 2. Others would probably disagree. Edit: I should probably state that contrastA was written with the design above in mind; if you have uneven numbers of phase1 / phase2 patients you might have to weight this contrast more appropriately.