contrast a linear model like:
expression ~ disease + gender + age
with:
expression ~ disease + gender + disease:gender + age
In both cases, I pull out the p-values for the "disease" parameters.
Each of the models pull out a completely independent set of differentially expressed genes (on ~100 disease vs. ~100 control samples).
The latter model generally gives more lower p-values, but at different genes. I can do a QQ-plot and see the differences, but it seems that each is capturing a different signal.
Q: What do these observations mean both biologically and statistically.
Q: What would be the explanation of genes that have a low p-value for the disease:gender interaction term?
It seems common practice to decide on a model, run limma across all genes, and then pull the p-values for the desired contrasts.
Q: Is there some precedent (citation) to say that a different subset of genes is better described by one model, while another set is described by another?
Q: What are methods for doing genome-wide model selection?
Any discussion, pointers to references, or request for clarification is welcomed.
Yes, it's genome-wide. Thanks for the ideas.