Question

Genome-Wide Model Selection And Interactions Terms

4

Entering edit mode

13.3 years ago

brentp 24k

contrast a linear model like:

expression ~ disease + gender + age

with:

expression ~ disease + gender + disease:gender + age

In both cases, I pull out the p-values for the "disease" parameters.

Each of the models pull out a completely independent set of differentially expressed genes (on ~100 disease vs. ~100 control samples).

The latter model generally gives more lower p-values, but at different genes. I can do a QQ-plot and see the differences, but it seems that each is capturing a different signal.

Q: What do these observations mean both biologically and statistically.

Q: What would be the explanation of genes that have a low p-value for the disease:gender interaction term?

It seems common practice to decide on a model, run limma across all genes, and then pull the p-values for the desired contrasts.

Q: Is there some precedent (citation) to say that a different subset of genes is better described by one model, while another set is described by another?

Q: What are methods for doing genome-wide model selection?

Any discussion, pointers to references, or request for clarification is welcomed.

model expression • 3.0k views

ADD COMMENT • link updated 11.0 years ago by Biostar 20 • written 13.3 years ago by brentp 24k

score 2 · Answer 1 · 2012-04-18

2

Entering edit mode

13.3 years ago

Wen.Huang ★ 1.2k

Q2: I guess a significant disease:gender term means "disease" has different effects in the sexes, which means what you really need is to do analysis on sexes separately, at least for those genes whose "disease:gender" term is significant.

Q3: It is possible for some genes to have sexual dimorphism but not others. Imagine sex specific genes are likely to be dimorphic.

Q4: To say one model is better than the other in your case is essentially saying if "disease:gender" is significant. I guess if you do this for all of your genes, it is genome-wide?