Question

Complicated Multi-factor Differential Expression

0

Entering edit mode

5.9 years ago

gradstudentNew ▴ 50

Hello all. Merry Christmas and happy holidays!

I am currently conducting a study and was hoping to get some advice on my model for differential expression. Currently, I have the following data:

Sample ||  Sex   ||  Age  || TissueType ||  Individual  || Diseased
ID1    || Male   ||   70  || Hemisphere || Individual 1 || Affected
ID2    || Male   ||   70  || Cortex     || Individual 1 || Affected
ID3    || Female ||   80  || Hemisphere || Individual 2 || Unaffected
ID4    || Female ||   80  || Cortex     || Individual 2 || Unaffected 
ID5    || Male   ||  100  || Hemisphere || Individual 3 || Affected

.....

Currently, I have the following model for my sleuth LRT test because I am interested in whether across these two different regions of the brain, are there differentially expressed genes in cases VS controls. I was reading about nested models and was wondering if the following model makes sense:

Full Model: ~ Diseased + Sex + Age + Sex:Age + Sex:Diseased + Age:Diseased + Diseased*TissueType + Diseased*Individual

Reduced Model: Sex + Age + Sex:Age + Sex:Diseased + Age:Diseased + Diseased*TissueType + Diseased*Individual

I tested this and found 6000 genes with a p-value of 0, and realized that there is something wrong haha. I'm also not sure whether to include additively + TissueType and + Individual. I would really appreciate any advice. Thank you so much!

RNAseq Multifactor Differential Expression Sleuth • 1.2k views

ADD COMMENT • link updated 5.8 years ago by Charles Warden 8.3k • written 5.9 years ago by gradstudentNew ▴ 50

score 0 · Answer 1 · 2018-12-27

That certainly sounds like an over-fit model.

Perhaps use PCA and clustering to determine what has the greatest overall effect, and then see what you can identify if you only factor out 1 or 2 other variables?

Or, even with 1 or 2 total variables, sometimes having a fold-change filter in addition to a p-value / FDR filter can help narrow down a few hundred up- and down-regulated genes (each).