I am analyzing an RNA-seq experiment in bacteria. We have samples from two different RNA isolation methods, so I want to add batch effects into the edgeR model.matrix to account for it. I wanted to check with the community that my set up is valid for the comparisons I want to make.
I first import the data and the targets, set up the factors (strain and time with time being the batch factor) and then set up the model matrix:
all <- read.table(file = "HTSeq-counts.txt", header = TRUE, row.names = 1)
targets <- read.table(file = "targets.txt", header = TRUE, sep = "\t")
strain <- factor(targets$Strain)
strain <- factor(strain, levels = c("GLU", "BA", "HBA", "SA", "Qs_P", "Qs_S", "WT_P", "WT_S"))
time <- factor(targets$Time)
design <- model.matrix(~0+strain+time)
That leaves me with the following design:
strainGLU strainBA strainHBA strainSA strainQs_P strainQs_S strainWT_P strainWT_S timeTime2
1 1 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0 1
3 1 0 0 0 0 0 0 0 1
4 0 1 0 0 0 0 0 0 1
5 0 1 0 0 0 0 0 0 1
6 0 1 0 0 0 0 0 0 1
7 0 0 1 0 0 0 0 0 0
8 0 0 1 0 0 0 0 0 1
9 0 0 1 0 0 0 0 0 1
10 0 0 1 0 0 0 0 0 1
11 0 0 0 1 0 0 0 0 0
12 0 0 0 1 0 0 0 0 1
13 0 0 0 1 0 0 0 0 1
14 0 0 0 1 0 0 0 0 1
attr(,"assign")
[1] 1 1 1 1 1 1 1 1 2
attr(,"contrasts")
attr(,"contrasts")$strain
[1] "contr.treatment"
attr(,"contrasts")$time
[1] "contr.treatment"
The contrasts are set up to compare, for example, strainBA
with strainGLU
and strainQs_S
with strainWT_S
.
I want to make sure that I am setting up the model matrix correctly to account for the batch effects. From reading the manual and looking at posts here and elsewhere, I believe this is correct, but I would feel better if another set of eyes could verify that it is set up correctly.
Thank you! Appreciate your reply!