Question

DESeq2 proper design setting

6

Entering edit mode

8.7 years ago

Matteo Schiavinato ★ 3.7k

Hi all,

I am performing a differential expression analysis with DESeq2, with these data:

control, 2 replicates
treatment, 2 replicates

So far there is still one obscure part of the manual for me: the design variable that you can set in many commands. I grasped the concept behind it but I am still struggling to understand how to use it properly. At the moment, I am using:

design = ~ condition

How would you set it, and why? Could someone write a couple of lines on how should I use that variable properly?

Any help appreciated!

DESeq2 Differential Expression Design RNA-Seq • 6.2k views

ADD COMMENT • link updated 8.7 years ago by Carlo Yague 9.0k • written 8.7 years ago by Matteo Schiavinato ★ 3.7k

score 4 · Accepted Answer · 2016-12-12

4

Entering edit mode

8.7 years ago

Carlo Yague 9.0k

In your case I would simply use the folowing, because the expression depends on only one factor which is the condition (either control or treatment).

condition = as.factor(c("control","treatment")
design = ~ condition

but if your replicates were not processed together, I would also take the batch effect into account.

batch = as.factor(c("rep1", "rep2")
design = ~ condition + batch

More examples on this tutorial.

ADD COMMENT • link 8.7 years ago by Carlo Yague 9.0k

0

Entering edit mode

With processed you mean sequenced or quality filtered?

ADD REPLY • link 8.7 years ago by Matteo Schiavinato ★ 3.7k

1

Entering edit mode

I mean the RNA extraction and/or library preparation.

For instance if the two first replicates were extracted together one day while the two second replicates were extracted the day after, you could expect some kind of technical variation to affect gene expression. This is called batch effect, which is annoying. The good thing is that DESeq2 can take it into account in its model.

If all your replicates were processed in parallel, then there is no batch effect. This is an ideal situation.

If you processed the two replicates of the control condition one day, and the two replicates of the treatment condition another day, then there is a batch effect, but you can not control for it. This is the worst situation.

ADD REPLY • link 8.7 years ago by Carlo Yague 9.0k

0

Entering edit mode

Thank you. Mine is the first scenario, I will add up the batch term in the design. However: is there a list, or a manual or something (not the official one of DESeq2 which I already read) that explains clearly which terms can go in the design function?

ADD REPLY • link 8.7 years ago by Matteo Schiavinato ★ 3.7k

2

Entering edit mode

the limma user's guide contains a very good introduction to linear models of designed experiments, maybe have a look at the model.matrix help page as well. model.matrix(~ condition) will define a 4x2 matrix containing an 'intercept' column of all-ones and a column containing two 0s (for the controls) and two 1s (for the treatments). DESeq2 fits a coefficient for each column in the design matrix.

ADD REPLY • link 8.7 years ago by russhh 5.8k

1

Entering edit mode

From your question, I feel (but I could be wrong) that you think that only specific terms are allowed in the design function. This is not the case. The name of the factor doesn't matter at all. For instance instead of condition = as.factor(c("control","treatment") you could write drug = as.factor(c("YES","NO") or azerty = as.factor(c("hello","world").

The design should simply include all the factors that are expected to affect gene expression in your experiment. In your case, the treatment and the batch, whatever the names you give them.

ADD REPLY • link 8.7 years ago by Carlo Yague 9.0k

0

Entering edit mode

You were right, now I got a piece more in my puzzle. New question: When using ~, or +, what does change? I mean, except from arithmetical things, is there any praxis that I should know? I'm gonna look through the limma documentation as well.

ADD REPLY • link 8.7 years ago by Matteo Schiavinato ★ 3.7k

3

Entering edit mode

This is the usual synthax for "formula" (see ?formula in R).

~ means that the folowing terms will be the factors in your design.

+ is used to add factors (note that ~ condition + batch is the same as ~ batch + condition)

You also have the operators * and : that are used to specify interactions between factors (not needed in your specific case).

More info here and here in the context of ANOVA and linear regression, respectively.