I have a gene expression data set with the following features:
1) mutation status (binary variable: mutant vs. WT)
2) Grouping (Group1, Group 2)
The question is whether the mutation leads to an increase in difference in gene expression between patients in Group 2 vs. Group 1.
I have thought about doing t-tests. In a clear example, mutant-Group1 vs. mutant-Group2 would be significant and WT-Group1 vs WT-Group2 shouldn't not be significant. However, I can think of examples that would be problematic (marginally not significant results, two significant results with very different p-values, etc.).
I have thought about simulating a null distribution for the difference between 2 pairs of random data sets, but is there a more straightforward method for analysis? For example, does the difference between t-test statistics also follow a normal distribution (it seems like that could prioritize genes of interest)? Likewise, is there a single test that can be used (instead of two separate tests)?
Thanks for your suggestion.
I thought about doing a 2-way ANOVA, but I didn't think it was quite the right test. More specifically, I would use 2-way ANOVA to try and factor out co-dependence between variables (like group + technical batch, or group + sample pairing).
So, if I wanted to test if expression varied with group in a way that is independent of mutation status, I think 2-way ANOVA would be the right way to go. However, I want to ask if I can observe a greater difference in expression between groups if I consider mutation status (which I don't think is ideal for 2-way ANOVA). Would you agree?
Just to clarify, what you want to do is what David W suggested (though use "modEven <- lm(y~grp+grp:genotype, data=df)" instead). The ANOVA (or, more simply, linear model) can give you the genotype effect while controlling for a mutation effect.
So, if I'm following you it seems you want to run a 2-way ANOVA that contains and interaction, and you are most interested in the signficance/magnitude of that interaction . Does the toy-dataset I've now added to my answer fit with what you are trying to do?
Thank you both for your help - I hadn't thought about analyzing the data this way.
I apologize if I am missing something, but I'm still not 100% certain if this addresses my specific question.
For example, how can I use the result to determine the nature of the interaction? How can I tell if the mutation enhanced the up-regulation, enhances down-regulation, antagonizes up-regulation, or antagonizes down-regulation? Are there no other possible causes for the interaction term to improve the model fitting?
For example, if the fold-change for WT-Group1 vs. WT-Group2 is 2.5, the biological interpretation would be different if the fold-change for mutant-Group1 vs. mutant-Group2 is 1.0 (no-difference --> mutant nullifies up-regulation in Group2) versus 4.0 (higher-up regulation --> mutant enhances up-regulation in Group2).
EDIT
If I use this as a follow-up to the initial pair-wise analysis, I think that should provide a complete analysis that I would be satisifed with. If any one else has any suggestions, feel free to provide them. However, this is the best answer that I have seen so far.