Hello, I have a microarray gene expression data set consists 30 samples for two groups (Wild Type and Mutant) with certain time points. I want to do differential expression analysis between conditions but I want my analyses to consider time points too. My data looks like this: Sample Condition Time Point 1 WT 32 2 WT 32 3 WT 32 4 WT 42 5 WT 42 6 WT 42 7 WT 49 8 WT 49 9 WT 49 10 WT 56 11 WT 56 12 WT 56 13 WT 66 14 WT 66 15 WT 66 16 MU 32 17 MU 32 18 MU 32 19 MU 42 20 MU 42 21 MU 42 22 MU 49 23 MU 49 24 MU 49 25 MU 56 26 MU 56 27 MU 56 28 MU 66 29 MU 66 30 MU 66
And my codes are here:
condition <- c(rep("WT",15) , rep("MU",15))
time_point <- c(rep("32h",3),rep("42h",3),rep("49h",3),rep("56h",3),rep("66h",3),rep("32h",3),rep("42h",3),rep("49h",3),rep("56h",3),rep("66h",3))
exp <- paste(condition , time_point , sep = "_")
exp <- as.factor(exp)
design = model.matrix(~exp)
fit = lmFit(data , design)
fit <- eBayes(fit)
When I construct my design matrix, first column is intercept (full of 1s) and I cannot see WT_32h in the matrix. Is there a better way to do it of should I trust my results with these codes?
Thank you in advance
You can trust the results, but you have to be careful about the contrasts to get what you want. If you specify your design as
The matrix will have one column for treatment and won't include an intercept. You may want to read section "9.6 Time Course Experiments" from the limma Users Guide.
Thank you so much for your answer. I tried your advice but there is a problem apparently. When I use "design = model.matrix( ~ 0 + exp )" about 95% of the genes are signficantly changed and it looks wrong. With my codes it was about 30%.
What is the disadvantage of using "~exp" instead of "~0 + exp" in that case?
Thank you.