Hi Megan,
Yes the statement "interaction term is the 'timing effect different across genotypes'" is accurate.
With your time series analysis across two cohorts (genotypes), the interaction term would capture interaction of condition and time (i.e,. if gene expression over time differs between genotypes). If your baselines are WT (for genotype) and Day (for time), then the combined coefficients you may be interested in would be the following:
timeNight.conditionKO
This interaction effect captures the differentially expressed genes (DEGs) that are significantly different across time (Night relative to Day) between genotypes (KO relative to WT); in other words, these genes have significantly different expression from Day to Night in KO mice relative to Day to Night in WT mice (+ log2FC would be significantly up-regulated at Night vs Day in KO relative to Night vs Day in WT).
If you were interested in the main effect of time (Night relative to Day) in WT, the comparison would be something like this, given the baselines above:
results(dds, name="time_Night_vs_Day")
DESeq2 uses a generalized linear model (GLM) with coefficients/effects like main effects and interaction effects (listed in the design) to fit the data to generate predictions. For instance, if I had two treatment cohorts (Treatment factor, with two levels TreatmentA and TreatmentB) and three time points (Time factor, with three levels D0, D10, and D20), and I was interested in DGE across cohorts (tests for the main effect of condition on gene expression), time groups (tests for the main effect of time), and across time and condition (i.e., if the effect of time differs by condition), then the design formula may resemble this:
design = ~ time + condition + time:condition
time:condition models time-specific condition effects or how the effect of time varies between genotypes. This can determine if the effects of one factor (i.e., time) depends on the level of another factor (i.e., genotype).
Related to ATpoint's response, a significant interaction term here may capture DEGs in Day vs Night between WT and KO, not accounting for differences in baseline expression levels between genotypes. There are scenarios that could complicate interpreting an interaction effect if no filtering is done.
Related to your follow-up: yes it would be a good idea to examine the DEGs in the KO genotype between time points. Assuming the above baselines: identifying DEGs between time points in genotype KO samples, you can 1.) find how GEX is changed over time within the KO genotype group, and 2.) determine if the interaction of time and genotype is driven by changes in KO at baseline time by finding the intersection of DEGs between these test results. If it fits with your analysis and appears in exploratory analysis (like hierarchical clustering), you could remove the DEGs due to the main effect of genotype alone at baseline time (Day) when investigating interaction effects independent of baseline genotype differences. For instance: if your research objective is to determine DGE due exclusively to the interaction of time and genotype, then you may want to specifically identify DEGs due to genotype alone (those found in the below contrast) and then find the non-intersecting DEGs found with the interaction effect.
condition_KO_vs_WT
Also, it's recommended to remove genes below a minimal count level across a designated number of samples.
Hope this is helpful.
Regards,
Maze
Yes, this interaction term would capture genes where the difference between day and night is altered as a consequence of genotype. Note though that depending on your question you might need an additional filtering for the baseline of gene expression. Interactions can be tricky, for example the difference between day and night might be much bigger (fold-change wise) in WT than KO, but the overall expression level of the gene might be much higher or lower in this genotype. Depending on your question you might want to filter genes genes being (or not) differentially-expression as a consequence of genotype alone, or check for meaningful DEG patterns by subjecting the genes to hierarchical clustering and heatmap visualization. The interaction alone only cares about the fold change, not the overall expression level difference between genotypes.
So, would it be a good idea then to look at genes in the KO between the two different time points to see a difference exists?
I'm not sue if this is super informative, but you could try merging the two factors into a single factor:
And then follow the paper below:
conversion to a single factor from A guide to creating design matrices for gene expression experiments
Interesting--this would be a similar and simpler method to analyze interaction (e.g., effect of time interacting with genotype). Here's an excerpt from DESeq2's documentation about this approach:
For Megan's use case, it could look like this:
Yes, I find that much more intuitive. When we include an interaction term in the model, the interpretation of the coefficients is different from a model without an interaction term. I have no idea why, but that's what the vignettes states.
Yes, if you want to compare one subgroup to a second subgroup, do it this way. You can make the same comparison with the interaction design, but it's way less easy to read. This way is clear.
Interactions are for when you want to look at all 4 groups together; find genes where the different genotype causes the day to night change to be different. (e.g a gene which doubles expression from day to night in the WT, but triples expression from day to night with the KO)
This is what I am trying to do above where i could find genes would shift between day and night because of the genotype alone.