Hi all,
although this topic has been much discussed, I'm not sure how to proceed in my situation. I'm analyzing an RNA-seq experiment looking for differentially expressed genes using edgeR. The experiment was designed as follows:
Sample Treatment Batch
TreatA-1 TreatA Date1
TreatA-2 TreatA Date1
TreatA-3 TreatA Date2
TreatB-1 TreatB Date3
TreatB-2 TreatB Date3
TreatB-3 TreatB Date4
TreatC-1 TreatC Date1
TreatC-2 TreatC Date1
TreatC-3 TreatC Date2
TreatD-1 TreatD Date3
TreatD-2 TreatD Date3
TreatD-3 TreatD Date4
Here the MDS plot:
I can see clearly the batch effect between replicates, where the third one (that is from a different batch) doesn't cluster correctly with the other two. Moreover, I wonder if the big differences observed between treatments AC and BD (explained mainly by dimension 1) is due to the different batches (Date1-Date2 vs Date3-Date4).
I have been trying to include these batch considerations in my design, but it seems the matrix is not of full rank.
group <- as.factor(c("TreatA","TreatA","TreatA",
"TreatB","TreatB","TreatB",
"TreatC","TreatC","TreatC",
"TreatD","TreatD","TreatD"))
batch <- factor(c("Date1","Date1","Date2","Date3","Date3","Date4",
"Date1","Date1","Date2","Date3","Date3","Date4"))
design <- model.matrix(~0+group+batch)
design
groupTreatA groupTreatB groupTreatC groupTreatD batchDate2 batchDate3 batchDate4
1 1 0 0 0 0 0 0
2 1 0 0 0 0 0 0
3 1 0 0 0 1 0 0
4 0 1 0 0 0 1 0
5 0 1 0 0 0 1 0
6 0 1 0 0 0 0 1
7 0 0 1 0 0 0 0
8 0 0 1 0 0 0 0
9 0 0 1 0 1 0 0
10 0 0 0 1 0 1 0
11 0 0 0 1 0 1 0
12 0 0 0 1 0 0 1
attr(,"assign")
[1] 1 1 1 1 2 2 2
attr(,"contrasts")
attr(,"contrasts")$group
[1] "contr.treatment"
attr(,"contrasts")$batch
[1] "contr.treatment"
First something that I still do not quite understand: where is the batch Date 1? I didn't use any intercept so I don't know how to interpret that is not in the design. Second, is there any way to include the batch effect avoiding the not full rank issue?
Thanks in advance.