Hi,
Experiment design
I will have results from three-way RNA-seq experiment. Now I'm trying to prepare for it's analysis.
There will be 4 maize lines (for now denoted: "A", "B", "C", "D") and two treatments: cold ("x") and regrowth ("c"). For each treatment there will be say 8 time points (in my simulated data only 4).
In fact plants will be grown in cold ("x") treatment first, after which the temperature will be raised for regrowth ("c"). At each time point leaves from three plant per line will be pooled as a sample. And these plants would be discarded.
So for each line it could be either analyzed in such a way
- that cold and regrowth profiles are joined or
- as a separate cold and regrowth profiles.
I think 1. would be better for clustering, but option 2. seems better from the testing point of view.
Biological questions
1. What are differences between treatment (cold vs regrowth) time profiles for each line?
It could be analyzed for each line separately, say in DEseq2
.
2. Do lines react differently to treatments? It involves line:treatment interaction, so it could be only tested in two-way analysis.
My attempts
I tried to expand the example from DEseq2
to include second factor https://master.bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#time-course-experiments
As three-way analysis gave "matrix not full rank" error I've joined line and treatment in one variable, named group as proposed in http://127.0.0.1:13491/library/DESeq2/doc/DESeq2.html#interactions
So my colData
is
> sym.coldata
line treatment time group
Ax15.1 A x 15 Ax
Ax15.2 A x 15 Ax
Ax15.3 A x 15 Ax
Ax22.1 A x 22 Ax
Ax22.2 A x 22 Ax
Ax22.3 A x 22 Ax
Ax29.1 A x 29 Ax
Ax29.2 A x 29 Ax
Ax29.3 A x 29 Ax
Ax36.1 A x 36 Ax
Ax36.2 A x 36 Ax
Ax36.3 A x 36 Ax
Ac15.1 A c 15 Ac
Ac15.2 A c 15 Ac
Ac15.3 A c 15 Ac
Ac22.1 A c 22 Ac
Ac22.2 A c 22 Ac
Ac22.3 A c 22 Ac
Ac29.1 A c 29 Ac
Ac29.2 A c 29 Ac
Ac29.3 A c 29 Ac
Ac36.1 A c 36 Ac
Ac36.2 A c 36 Ac
Ac36.3 A c 36 Ac
Bx15.1 B x 15 Bx
Bx15.2 B x 15 Bx
Bx15.3 B x 15 Bx
Bx22.1 B x 22 Bx
Bx22.2 B x 22 Bx
Bx22.3 B x 22 Bx
Bx29.1 B x 29 Bx
Bx29.2 B x 29 Bx
Bx29.3 B x 29 Bx
Bx36.1 B x 36 Bx
Bx36.2 B x 36 Bx
Bx36.3 B x 36 Bx
Bc15.1 B c 15 Bc
Bc15.2 B c 15 Bc
Bc15.3 B c 15 Bc
Bc22.1 B c 22 Bc
Bc22.2 B c 22 Bc
Bc22.3 B c 22 Bc
Bc29.1 B c 29 Bc
Bc29.2 B c 29 Bc
Bc29.3 B c 29 Bc
Bc36.1 B c 36 Bc
Bc36.2 B c 36 Bc
Bc36.3 B c 36 Bc
Cx15.1 C x 15 Cx
Cx15.2 C x 15 Cx
Cx15.3 C x 15 Cx
Cx22.1 C x 22 Cx
Cx22.2 C x 22 Cx
Cx22.3 C x 22 Cx
Cx29.1 C x 29 Cx
Cx29.2 C x 29 Cx
Cx29.3 C x 29 Cx
Cx36.1 C x 36 Cx
Cx36.2 C x 36 Cx
Cx36.3 C x 36 Cx
Cc15.1 C c 15 Cc
Cc15.2 C c 15 Cc
Cc15.3 C c 15 Cc
Cc22.1 C c 22 Cc
Cc22.2 C c 22 Cc
Cc22.3 C c 22 Cc
Cc29.1 C c 29 Cc
Cc29.2 C c 29 Cc
Cc29.3 C c 29 Cc
Cc36.1 C c 36 Cc
Cc36.2 C c 36 Cc
Cc36.3 C c 36 Cc
Dx15.1 D x 15 Dx
Dx15.2 D x 15 Dx
Dx15.3 D x 15 Dx
Dx22.1 D x 22 Dx
Dx22.2 D x 22 Dx
Dx22.3 D x 22 Dx
Dx29.1 D x 29 Dx
Dx29.2 D x 29 Dx
Dx29.3 D x 29 Dx
Dx36.1 D x 36 Dx
Dx36.2 D x 36 Dx
Dx36.3 D x 36 Dx
Dc15.1 D c 15 Dc
Dc15.2 D c 15 Dc
Dc15.3 D c 15 Dc
Dc22.1 D c 22 Dc
Dc22.2 D c 22 Dc
Dc22.3 D c 22 Dc
Dc29.1 D c 29 Dc
Dc29.2 D c 29 Dc
Dc29.3 D c 29 Dc
Dc36.1 D c 36 Dc
Dc36.2 D c 36 Dc
Dc36.3 D c 36 Dc
Analysis
I've successfully created dds
object of DEseq
class
> dds2=DESeqDataSetFromMatrix(countData=cts2, colData=sym.coldata, design= ~ 0 + group + time + group:time)
converting counts to integer mode
I've removed intercept to make contrasts more readable as proposed here https://bioinformatics.stackexchange.com/a/154/2481 and here https://ucdavis-bioinformatics-training.github.io/2019-March-Bioinformatics-Prerequisites/thursday/linear_models.html in "Another parametrization" section.
Likelihood Ratio Test on reduced model
> dds2=DESeq(dds2, test="LRT", reduced = ~ 0 + group + time)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
> res2=results(dds2)
From this point I don't know what to do
I can extract coefficients
> resultsNames(dds2)
[1] "groupAc" "groupAx" "groupBc" "groupBx"
[5] "groupCc" "groupCx" "groupDc" "groupDx"
[9] "time22" "time29" "time36" "groupAx.time22"
[13] "groupBc.time22" "groupBx.time22" "groupCc.time22" "groupCx.time22"
[17] "groupDc.time22" "groupDx.time22" "groupAx.time29" "groupBc.time29"
[21] "groupBx.time29" "groupCc.time29" "groupCx.time29" "groupDc.time29"
[25] "groupDx.time29" "groupAx.time36" "groupBc.time36" "groupBx.time36"
[29] "groupCc.time36" "groupCx.time36" "groupDc.time36" "groupDx.time36"
Of course the p-value from the test would say if there is any effect of group across time (not informative at all). So I'd have to make my own contrasts.
I understand that eg. groupAc
is a coefficient for group "Ac" (line "A" in treatment "c") at time "15" (which, as I can see has been set as reference time).
So to get treatment effect for line "A" at time "15" I'd have to make following contrast
res15 <- results(dds2, contrast = c("group", "Ax", "Ac"))
Similarly, to get treatment effect for line "B" at time "22" I'd have to compare "groupBc.time22" and "groupBx.time22".
I wonder, where is coefficient for "time15"?
I understand that for my Biological questions given above I'd have to make more complicated contrasts, involving combination (sums?) of coefficients. Is it possible in DEseq2? How to do that? I also think about maSigPro
package, but for now I'd like to try DEseq2
.