Question

edgeR batch effect model matrix sanity check

1

Entering edit mode

6 months ago

kmyers2 ▴ 90

I am analyzing an RNA-seq experiment in bacteria. We have samples from two different RNA isolation methods, so I want to add batch effects into the edgeR model.matrix to account for it. I wanted to check with the community that my set up is valid for the comparisons I want to make.

I first import the data and the targets, set up the factors (strain and time with time being the batch factor) and then set up the model matrix:

all <- read.table(file = "HTSeq-counts.txt", header = TRUE, row.names = 1)
targets <- read.table(file = "targets.txt", header = TRUE, sep = "\t")
strain <- factor(targets$Strain)
strain <- factor(strain, levels = c("GLU", "BA", "HBA", "SA", "Qs_P", "Qs_S", "WT_P", "WT_S"))
time <- factor(targets$Time)
design <- model.matrix(~0+strain+time)

That leaves me with the following design:

   strainGLU strainBA strainHBA strainSA strainQs_P strainQs_S strainWT_P strainWT_S timeTime2
1          1        0         0        0          0          0          0          0         0
2          1        0         0        0          0          0          0          0         1
3          1        0         0        0          0          0          0          0         1
4          0        1         0        0          0          0          0          0         1
5          0        1         0        0          0          0          0          0         1
6          0        1         0        0          0          0          0          0         1
7          0        0         1        0          0          0          0          0         0
8          0        0         1        0          0          0          0          0         1
9          0        0         1        0          0          0          0          0         1
10         0        0         1        0          0          0          0          0         1
11         0        0         0        1          0          0          0          0         0
12         0        0         0        1          0          0          0          0         1
13         0        0         0        1          0          0          0          0         1
14         0        0         0        1          0          0          0          0         1
attr(,"assign")
[1] 1 1 1 1 1 1 1 1 2
attr(,"contrasts")
attr(,"contrasts")$strain
[1] "contr.treatment"

attr(,"contrasts")$time
[1] "contr.treatment"

The contrasts are set up to compare, for example, strainBA with strainGLU and strainQs_S with strainWT_S.

I want to make sure that I am setting up the model matrix correctly to account for the batch effects. From reading the manual and looking at posts here and elsewhere, I believe this is correct, but I would feel better if another set of eyes could verify that it is set up correctly.

R edgeR RNA-seq batch-effect • 483 views

ADD COMMENT • link 6 months ago by kmyers2 ▴ 90

score 3 · Accepted Answer · 2024-06-14

3

Entering edit mode

6 months ago

Gordon Smyth ★ 7.7k

The model.matrix is fine. That's the standard way to account for a batch effect.