Question

Differential analysis with replicates using edgeR

0

Entering edit mode

6.0 years ago

Vasu ▴ 790

Hi,

I have 8 RNA-Seq samples. Among them 4 are controls and other 4 are treatment. I'm interested in doing differential analysis with edgeR. Following is the column data.

Samples Type
Sample1 Control
Sample2 Control
Sample5 Control
Sample6 Control
Sample7 Treatment
Sample8 Treatment
Sample3 Treatment
Sample4 Treatment

Among the above table Sample1, Sample2 [Controls] and Sample3, Sample4 [Treatment] are done on one day and Sample5, Sample6 [Controls] and Sample7, Sample8 [Treatment] are done on other day.

As you see the replicates were not processed together, there is the batch effect. In this way how I can create the design matrix in edgeR for differential analysis.

RNA-Seq edger differential analysis design matrix • 2.4k views

ADD COMMENT • link updated 5.9 years ago by Biostar 20 • written 6.0 years ago by Vasu ▴ 790

score 2 · Answer 1 · 2018-12-04

2

Entering edit mode

6.0 years ago

Benn 8.3k

Please read the manual of edgeR, it is very clear written also for beginners.

ADD COMMENT • link 6.0 years ago by Benn 8.3k

0

Entering edit mode

Could you please tell me which section I should check for this.

ADD REPLY • link 6.0 years ago by Vasu ▴ 790

1

Entering edit mode

The section about batch effect. But reading from the start is also wise...

ADD REPLY • link 6.0 years ago by Benn 8.3k

0

Entering edit mode

I had a look into it. This is the first time I'm working with such data. Could you please tell me whether this is right or not.

coldata

Samples Type           Replicates
Sample1 Control          rep1
Sample2 Control          rep1
Sample5 Control          rep2
Sample6 Control          rep2
Sample7 Treatment      rep2
Sample8 Treatment      rep2
Sample3 Treatment       rep1
Sample4 Treatment       rep1

group <- factor(paste0(coldata$Type))

I crated design matrix like following:

design <- model.matrix(~ 0 + group + coldata$Replicates)
colnames(design) <- c("Control","Treatment","Repl")

And the design looks like below:

  Control Treatment Repl
1       1        0    0
2       1        0    0
3       0        1    0
4       0        1    0
5       1        0    1
6       1        0    1
7       0        1    1
8       0        1    1

Then i have used following commands for linear model fit and DEA.

y <- estimateDisp(y, design, robust=TRUE)
fit <- glmQLFit(y, design, robust=TRUE)

contrast.matrix <- makeContrasts(Treatment-Control, levels=design)
contrast.matrix

Do you think this is right?

ADD REPLY • link 6.0 years ago by Vasu ▴ 790

0

Entering edit mode

It looks alright, but one question. You call the batch replicate, does that mean they were from the same sample? Is it technical replication?

ADD REPLY • link 6.0 years ago by Benn 8.3k

0

Entering edit mode

Yes, they were the same sample but RNA extraction is done on the next day.

As mentioned above Sample1, Sample2 [Controls] and Sample3, Sample4 [Treatment] are done on one day and Sample5, Sample6 [Controls] and Sample7, Sample8 [Treatment] are done on other day

All the samples are from the same cell-line.

ADD REPLY • link 6.0 years ago by Vasu ▴ 790

0

Entering edit mode

How can you be sure that this 'batch' effect is going to bias your results? - what evidence have you seen? In many experiments, samples are processed on separate days with minimal or no effect on the end results. If we did a time-course experiment, for example, and assumed that time was a batch effect, then we would wipe out the very differences that we wanted to find based on time.

ADD REPLY • link 5.9 years ago by Kevin Blighe 88k