Question

design matrix considering treatment, timepoints and batches

0

Entering edit mode

3.6 years ago

Iván ▴ 60

Hi,

I'm analysing a microarray with 59 samples distributed across the following groups and timepoints:

SampleName  Group   Batch   Timepoint
sample1 control_30  1   30
sample2 control_30  2   30
sample3 control_30  2   30
sample4 control_30  3   30
sample5 control_30  4   30
sample6 control_30  4   30
sample7 control_30  4   30
sample8 control_30  4   30
sample9 treat1_30   1   30
sample10    treat1_30   1   30
sample11    treat1_30   2   30
sample12    treat1_30   2   30
sample13    treat2_30   1   30
sample14    treat2_30   1   30
sample15    treat2_30   2   30
sample16    treat2_30   2   30
sample17    treat2_30   3   30
sample18    treat3_30   1   30
sample19    treat3_30   1   30
sample20    treat3_30   2   30
sample21    treat3_30   2   30
sample22    control_60  5   60
sample23    control_60  5   60
sample24    control_60  6   60
sample25    control_60  6   60
sample26    treat1_60   5   60
sample27    treat1_60   5   60
sample28    treat1_60   6   60
sample29    treat1_60   6   60
sample30    treat2_60   5   60
sample31    treat2_60   5   60
sample32    treat2_60   6   60
sample33    treat2_60   6   60
sample34    treat2_60   3   60
sample35    treat3_60   5   60
sample36    treat3_60   5   60
sample37    treat3_60   6   60
sample38    treat3_60   6   60
sample39    control_90  7   90
sample40    control_90  7   90
sample41    control_90  8   90
sample42    control_90  8   90
sample43    treat1_90   3   90
sample44    treat1_90   3   90
sample45    treat1_90   3   90
sample46    treat1_90   3   90
sample47    treat2_90   7   90
sample48    treat2_90   7   90
sample49    treat2_90   8   90
sample50    treat2_90   8   90
sample51    treat3_90   7   90
sample52    treat3_90   7   90
sample53    treat3_90   8   90
sample54    treat3_90   8   90
sample55    treat3_90   3   90
sample56    treat4_90   7   90
sample57    treat4_90   7   90
sample58    treat4_90   8   90
sample59    treat4_90   8   90

We're interested in evaluating the differences between each treatment VS. control within each timepoint, so each treatN in timepoint N is compared against the respective control_N.

My current code and design is:

# Design matrix
batch <- factor(y$targets$Batch)
group <- factor(y$targets$Group)
design <- model.matrix(~0 + batch + group)

colnames(design) <- gsub("group", "", colnames(design))
colnames(design) <- gsub("batch ", "", colnames(design))

contr.matrix <- makeContrasts(
  # timepoint 30
  Cont.VS.Treat1.30dpi = treat1_30-control_30,
  Cont.VS.Treat2.30dpi = treat2_30-control_30,
  Cont.VS.Treat3.30dpi = treat3_30-control_30,
  # timepoint 60
 Cont.VS.Treat1.60dpi = treat1_60-control_60,
  Cont.VS.Treat2.60dpi = treat2_60-control_60,
  Cont.VS.Treat3.60dpi = treat3_60-control_60,
  # timepoint 90
 Cont.VS.Treat1.90dpi = treat1_90-control_90,
  Cont.VS.Treat2.90dpi = treat2_90-control_90,
  Cont.VS.Treat3.90dpi = treat3_90-control_90,
  levels = colnames(design))

fit <- lmFit(y, design)
fit2 <- contrasts.fit(fit, contr.matrix)
fit2 <- eBayes(fit2,trend=TRUE,robust=TRUE)

Then, to extract the DEGs between treat1 and control in timepoint 30 while accounting for batch effects I get the topTable from the relevant coefficient, as follows:

topTable(fit2, coef = 1, adjust.method = "BH", n=5, sort.by = "P")

While for treat2 and control in timepoint 30 would be:

topTable(fit2, coef = 2, adjust.method = "BH", n=5, sort.by = "P")

etc.

I would like to assert whether this is the correct way to construct my design matrix and contrasts as I'm getting very different number of DEGs from a previous analysis performed by someone else in our group with the same dataset. Help will be greatly appreciated.

thank you!

Ivàn

limma design microarray experimental • 943 views

ADD COMMENT • link updated 3.6 years ago by Kevin Blighe 88k • written 3.6 years ago by Iván ▴ 60

0

Entering edit mode

Anyone willing to input on this? Thank you! :-)

ADD REPLY • link 3.6 years ago by Iván ▴ 60

score 0 · Answer 1 · 2021-04-29

I don't see anything of major concern; however, one thing that may result in extra bias is that your controls are sometimes being used from batches not also used by treatment. For example:

treat1_30 (batch 1 and 2) vs control_30 (batch 1, 2,3, and 4)

To be frank, you may consider first removing the batch effect from the data via limma::removeBatchEffect(), or use batch as a blocking factor (special functionality supplied by limma).

Just some extra things to try.

Kevin