Question

Experimental design RNAseq differential expression

0

Entering edit mode

5.8 years ago

Pietro ▴ 240

Dear users,

I am struggling to understand if my design is correct. I found edgeR section 3.3.1 similar to my situation but I am not that confident.

Here is my experimental design:

samples_table

sampleId    cellLine    treatment   time        IC
s1                 a      vehicle      0         S
s2                 a         drug     48         S
s3                 a         drug    168         S
s4                 b      vehicle      0         S
s5                 b         drug     48         S
s6                 b         drug    168         S
s7                 c      vehicle      0         S
s8                 c         drug     48         S
s9                 c         drug    168         S
s10                d      vehicle      0         S
s11                d         drug     48         S
s12                d         drug    168         S
s13                e      vehicle      0         R
s14                e         drug     48         R
s15                e         drug    168         R
s16                f      vehicle      0         R
s17                f         drug     48         R
s18                f         drug    168         R

I have 6 cell lines treated with a drug and RNA sequenced after 48 and 168 hours of treatment. Last column indicates if the cell line is susceptible or resistant to another compound.

I would like to find how resistant cell lines differentially respond to the drug at 48 and/or at 168 hours compared to resistant ones.

Here is my approach:

group <- factor(paste(samples_table$IC, samples_table$time, sep="."))
y <- DGEList(counts.keep, group=group)
y <- calcNormFactors(y)
design <- model.matrix(~0 + group)
colnames(design) <- levels(group)
y <- estimateDisp(y, design)

# Quasi-likelihood test
fit <- glmQLFit(y, design)

design

   R.0 R.168 R.48 S.0 S.168 S.48
1    0     0    0   1     0    0
2    0     0    0   0     0    1
3    0     0    0   0     1    0
4    0     0    0   1     0    0
5    0     0    0   0     0    1
6    0     0    0   0     1    0
7    0     0    0   1     0    0
8    0     0    0   0     0    1
9    0     0    0   0     1    0
10   1     0    0   0     0    0
11   0     0    1   0     0    0
12   0     1    0   0     0    0
13   1     0    0   0     0    0
14   0     0    1   0     0    0
15   0     1    0   0     0    0
attr(,"assign")
[1] 1 1 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$group
[1] "contr.treatment"

my.contrasts <- makeContrasts(
  S48 = S.48 - S.0,
  S168 = S.168 - S.0,
  S168vs48 = S.168 - S.48,
  R48 = R.48 - R.0,
  R168 = R.168 - R.0,
  R168vs48 = R.168 - R.48,
  RvsS.0 = R.0 - S.0,
  RvsS.48 = (R.48 - R.0) - (S.48 - S.0),
  RvsS.168 = (R.168 - R.0) - (S.168 - S.0),
  all = (R.48 + R.168 - R.0) - (S.48 + S.168 - S.0),
  levels = design
)

# to find genes that differentially respond at 48h between resistant and susceptible cell lines

qlf <- glmQLFTest(fit, 
                  contrast = my.contrasts[ , "RvsS.48"])

topTags(qlf)

What do you think? Shouldn't I account the fact that I have different cell lines as a sort of batch effect?

Thanks a lot

Pietro

PS: cross posted to https://support.bioconductor.org/p/119310/

limma edger RNA-Seq batch differential expression • 2.2k views

ADD COMMENT • link updated 5.8 years ago by Biostar 20 • written 5.8 years ago by Pietro ▴ 240

0

Entering edit mode

As you posted on Bioconductor, Aaron or Gordon will likely respond, and their answer will supersede any answer given here. On face value —yes— you should be adjusting for cell-line by, for example, including cellLine in your design formula. So, something like:

~ cellLine + group

Before doing this, I would also just check via a PCA bi-plot to see how cellLine is distributed across your samples

ADD REPLY • link 5.8 years ago by Kevin Blighe 88k

0

Entering edit mode

Thanks Kevin,

Tried that, but it throws an "Design matrix not of full rank" error.

ADD REPLY • link 5.8 years ago by Pietro ▴ 240

0

Entering edit mode

You can't include both cellLine and IC, they're mutually exclusive.

ADD REPLY • link 5.8 years ago by Devon Ryan 105k

0

Entering edit mode

Hi Devon,

Realized that. So, what do you suggest to account for cellLine in my design?

Thanks

ADD REPLY • link 5.8 years ago by Pietro ▴ 240

0

Entering edit mode

Remove IC and perform any relevant grouping of it as needed during a contrast.

ADD REPLY • link 5.8 years ago by Devon Ryan 105k

0

Entering edit mode

You mean

design <- model.matrix(~ 0 + cellLine + time, 
                       levels = samples_table)

?

And then I have

     a   b    c    d   e     f time48 time168
1    1   0    0    0   0     0      0       0
2    1   0    0    0   0     0      1       0
3    1   0    0    0   0     0      0       1
4    0   1    0    0   0     0      0       0
5    0   1    0    0   0     0      1       0
6    0   1    0    0   0     0      0       1
7    0   0    1    0   0     0      0       0
8    0   0    1    0   0     0      1       0
9    0   0    1    0   0     0      0       1
10   0   0    0    1   0     0      0       0
11   0   0    0    1   0     0      1       0
12   0   0    0    1   0     0      0       1
13   0   0    0    0   1     0      0       0
14   0   0    0    0   1     0      1       0
15   0   0    0    0   1     0      0       1
16   0   0    0    0   0     1      0       0
17   0   0    0    0   0     1      1       0
18   0   0    0    0   0     1      0       1

How do I specify that I want time48 only for cell lines e and f against time48 for the other 4 cell lines? Thanks

ADD REPLY • link 5.8 years ago by Pietro ▴ 240

0

Entering edit mode

If you want specific group comparisons like that it's more convenient to make groups like a_time48, a_time168 and use ~0 + group as a model.

ADD REPLY • link 5.8 years ago by Devon Ryan 105k