Question

EdgeR matrix design and comparisons for paired samples

1

Entering edit mode

4.5 years ago

silas008 ▴ 170

Hi guys,

I am a bit confused about the statistics for paired samples in edgeR.

I have 4 different treatments, A, B, C and D, each one with 4 samples. 2 of those samples are "before" treatment and the other 2 are "after" treatment.

If iam correct, checking the edgeR manual, the design of the model matrix should be:

> groups <- factor(targets$Group)
> treatment <- factor(targets$Treatment, levels=c("before","after"))
> design <- model.matrix(~groups+treatment)

But in the case I have a data that is a simple table containing the genes in the first column and de samples in the other columns, how can I construct the model matrix to accept this table format?

I think I can simple open the table as a matrix and atributte the factors to the samples:

> my_table <- data.matrix(my_table, row.names.default(my_table))
> groups <- factor(c(A1,A2,A3,A4,B1,B2,B3,B4,C1,C2,C3,C4,D1,D2,D3,D4))
> treatment <- factor(c("before", "before", "after", "after","before", "before", "after", "after","before", "before", "after", "after",))
> design <- design.matrix(~groups+treatment)
> y <- DGEList(counts=my_table, group=groups)

But I don't know if this is correct.

Does anyone can help with that, I'd really appreciate it.

Thanks

RNA-Seq edgeR • 1.3k views

ADD COMMENT • link updated 4.5 years ago by h.mon 35k • written 4.5 years ago by silas008 ▴ 170

score 1 · Answer 1 · 2020-08-07

The "correct" way will depend on what A, B, C, D, before and after are, and on what you are interested to test, but it seems to me a better approach (not that what you did is wrong) in your case would be to create a factor combining both group and treatment

Group <- factor( paste( groups, treatment, sep = "." ) )
design <- design.matrix( ~ 0 + Group )
y <- DGEList( counts = my_table, group = Group )

I have 4 different treatments, A, B, C and D, each one with 4 samples. 2 of those samples are "before" treatment and the other 2 are "after" treatment.

If A, B, C and D, are treatments, why do you name the factor which describes them as group? And if before and after are time of sampling, why do you call this factor treatment instead of time? Adequately naming variables will make your code easier to understand.