Hi everyone,
I am analyzing data from microarrays with batch effect. To correct batch effect I am including this variable in the formula for the linear model:
formula <- paste("~0 ", "Group ", "Batch", sep = "+ ")
The group variable includes two categories: M_GO and NM_GO, whereas the batch variable includes three categories: Batch1, Batch2 and Batch3.
After that, I have created the design matrix, where the name of the columns are the following:
M_GO NM_GO Batch1 Batch2
Both categories for group appear because there is not an intercept. But, regarding the batch variable, only two of three categories are present. Samples with Batch1=1 & Batch2=0 will be Batch1, samples with Batch1=0 & Batch2=1 will be Batch2, and samples with Batch1=0 & Batch2=0 will be Batch3.
However, my question is regarding the contrastsmatrix. As Batch3 does not appear in the design matrix, the only comparison that can be perfomed for batch is "Batcheffect = Batch1 - Batch2".
contrasts <- c("GO_MvsNM = M_GO - NM_GO", "Batcheffect = Batch1 - Batch2")
Is this contrasts matrix taking into account the 3 categories for batch effect? Am I doing the analysis to correct batch effect properly?
Thank you very much in advance!!
I thought I knew the answer to this but did some digging only to find out I do not...but I have a few comments. First, this question is not about batch effect at all, but rather why your design matrix has one less column than the product of the number of groups and batches in your experiment (=2*3). Try troubleshooting this more general question--i suspect that the fact that this appears to relate most directly to batch effect was a bit of a red herring in your problem solving. Check out https://stackoverflow.com/a/17284028 and the comments here https://stats.stackexchange.com/q/174976
Realistically, why are you trying to find genes DE with respect to batches? Your code for making contrasts between groups looks perfect. If you really wish to compare batches 1 vs. 2, 1 vs. 3, and 2 vs. 3, simply change the order of your variables in
model.matrix()
, or for you, formula:And you will find columns for all 3 batches in the model matrix.